REPORT  DOCUMENTATION  PAGE 


Form  Approved  OMB  NO.  0704-0188 


The  public  reporting  burden  for  this  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions, 
searching  existing  data  sources,  gathering  and  maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments 

regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information,  including  suggesstions  for  reducing  this  burden,  to  Washington 

Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington  VA,  22202-4302. 
Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  any  oenalty  for  failing  to  comply  with  a  collection  of 
information  if  it  does  not  display  a  currently  valid  OMB  control  number. 

PLEASE  DO  NOT  RETURN  YOUR  FORM  TO  THE  ABOVE  ADDRESS. 


1.  REPORT  DATE  (DD-MM-YYYY) 


2.  REPORT  TYPE 

Technical  Report 


4.  TITLE  AND  SUBTITLE 

Asymptotics  of  Markov  Kernels  and  the  Tail  Chain 


3.  DATES  COVERED  (From  -  To) 


5a.  CONTRACT  NUMBER 

W91  INF-10-1-0289 


5b.  GRANT  NUMBER 


6.  AUTHORS 

Sidney  Resnick,  David  Zeber 


5c.  PROGRAM  ELEMENT  NUMBER 
611102 


5d.  PROJECT  NUMBER 


5e.  TASK  NUMBER 


5f.  WORK  UNIT  NUMBER 


7.  PERFORMING  ORGANIZATION  NAMES  AND  ADDRESSES 

Cornell  University 

Office  of  Sponsored  Programs 

Cornell  University 

Ithaca,  NY  14853  -2801 


9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND 
ADDRESS(ES) 

U.S.  Army  Research  Office 
P.O.Box  12211 

Research  Triangle  Park,  NC  27709-2211 


8.  PERFORMING  ORGANIZATION  REPORT 
NUMBER 


10.  SPONSOR/MONITOR'S  ACRONYM(S) 
ARO 


11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 

57856-MA.23 


12.  DISTRIBUTION  AVAILIBILITY  STATEMENT 
Approved  for  public  release;  distribution  is  unlimited. 


13.  SUPPLEMENTARY  NOTES 

The  views,  opinions  and/or  findings  contained  in  this  report  are  those  of  the  author(s)  and  should  not  contrued  as  an  official  Department 
of  the  Army  position,  policy  or  decision,  unless  so  designated  by  other  documentation. 


14.  ABSTRACT 

An  asymptotic  model  for  extreme  behavior  of  certain  Markov  chains  is  the  "tail  chain".  Generally  taking  the  form 
of  a  multiplicative  random  walk,  it  is  useful  in  deriving  extremal  characteristics  such  as  point  process  limits.  We 
place  this  model  in  a  more  general  context,  formulated  in  terms  of  extreme  value  theory  for  transition  kernels,  and 
extend  it  by  formalizing  the  distinction  between  extreme  and  non-extreme  states.  We  make  the  link  between  the 
update  function  and  transition  kernel  forms  considered  in  previous  work,  and  we  show  that  the  tail  chain  model 


15.  SUBJECT  TERMS 

Extreme  values,  multivariate  regular  variation,  Markov  chain,  transition  kernel,  tail  chain,  heavy  tails. 


16.  SECURITY  CLASSIFICATION  OF: 

17.  LIMITATION  OF 

15.  NUMBER 

19a.  NAME  OF  RESPONSIBLE  PERSON 

a.  REPORT 

b.  ABSTRACT 

c.  THIS  PAGE 

ABSTRACT 

OF  PAGES 

Sidney  Resnick 

UU 

UU 

UU 

UU 

19b.  TELEPHONE  NUMBER 

607-255-1210 

Standard  Form  298  (Rev  8/98) 
Prescribed  by  ANSI  Std.  Z39. 1 8 


Report  Title 

Asymptotics  of  Markov  Kernels  and  the  Tail  Chain 

ABSTRACT 

An  asymptotic  model  for  extreme  behavior  of  certain  Markov  chains  is  the  "tail  chain".  Generally  taking  the  form  of  a 
multiplicative  random  walk,  it  is  useful  in  deriving  extremal  characteristics  such  as  point  process  limits.  We  place  this 
model  in  a  more  general  context,  formulated  in  terms  of  extreme  value  theory  for  transition  kernels,  and  extend  it  by 
formalizing  the  distinction  between  extreme  and  non-extreme  states.  We  make  the  link  between  the  update  function  and 
transition  kernel  forms  considered  in  previous  work,  and  we  show  that  the  tail  chain  model  leads  to  a  multivariate 
regular  variation  property  of  the  finite-dimensional  distributions  under  assumptions  on  the  marginal  tails  alone. 


ASYMPTOTICS  OF  MARKOV  KERNELS  AND  THE  TAIL  CHAIN 


SIDNEY  I.  RESNICK  AND  DAVID  ZEBER 


Abstract.  An  asymptotic  model  for  extreme  behavior  of  certain  Markov  chains  is  the  “tail  chain” . 
Generally  taking  the  form  of  a  multiplicative  random  walk,  it  is  useful  in  deriving  extremal  char¬ 
acteristics  such  as  point  process  limits.  We  place  this  model  in  a  more  general  context,  formulated 
in  terms  of  extreme  value  theory  for  transition  kernels,  and  extend  it  by  formalizing  the  distinc¬ 
tion  between  extreme  and  non-extreme  states.  We  make  the  link  between  the  update  function  and 
transition  kernel  forms  considered  in  previous  work,  and  we  show  that  the  tail  chain  model  leads  to 
a  multivariate  regular  variation  property  of  the  finite-dimensional  distributions  under  assumptions 
on  the  marginal  tails  alone. 


1.  Introduction 

A  method  of  approximating  the  extremal  behavior  of  discrete-time  Markov  chains  is  to  use 
an  asymptotic  process  called  the  tail  chain  under  an  asymptotic  assumption  on  the  transition 
kernel  of  the  chain.  Loosely  speaking,  if  the  distribution  of  the  next  state  converges  under  some 
normalization  as  the  current  state  becomes  extreme,  then  the  Markov  chain  behaves  approximately 
as  a  multiplicative  random  walk  upon  leaving  a  large  initial  state.  This  approach  leads  to  intuitive 
extremal  models  in  such  cases  as  autoregressive  processes  with  random  coefficients,  which  include  a 
class  of  ARCH  models.  The  focus  on  Markov  kernels  was  introduced  by  Smith  |24j.  Perfekt  mEi 
extended  the  approach  to  higher  dimensions,  and  Segers  [23]  rephrased  the  conditions  in  terms  of 
update  functions. 

Though  not  restrictive  in  practice,  the  previous  approach  tends  to  mask  aspects  of  the  processes’ 
extremal  behaviour.  Markov  chains  which  admit  the  tail  chain  approximation  fall  into  one  of  two 
categories.  Starting  from  an  extreme  state,  the  chain  either  remains  extreme  over  any  finite  time 
horizon,  or  will  drop  to  a  “non-extreme”  state  of  lower  order  after  a  finite  amount  of  time.  The 
latter  case  is  problematic  in  that  the  tail  chain  model  is  not  sensitive  to  possible  subsequent  jumps 
from  a  non-extreme  state  to  an  extreme  one.  Previous  developments  handle  this  by  ruling  out 
the  class  of  processes  exhibiting  this  behaviour  via  a  technical  condition,  which  we  refer  to  as  the 
regularity  condition.  Also,  most  previous  work  has  assumed  stationarity,  since  interest  focused  on 
computing  the  extremal  index  or  deriving  limits  for  the  exceedance  point  processes,  drawing  on 
the  theory  established  for  stationary  processes  with  mixing  by  Leadbetter  et  al.  m-  However, 
stationarity  is  not  fundamental  in  determining  the  extremal  behaviour  of  the  finite-dimensional 
distributions. 

We  place  the  tail  chain  approximation  in  the  context  of  an  extreme  value  theory  for  Markovian 
transition  kernels,  which  a  priori  does  not  necessitate  any  such  restrictions  on  the  class  of  processes 
to  which  it  may  be  applied.  In  particular,  we  introduce  the  concept  of  boundary  distribution,  which 
controls  tail  chain  transitions  from  non-extreme  to  extreme.  Although  distributional  convergence 
results  are  more  naturally  phrased  in  terms  of  transition  kernels,  we  treat  the  equivalent  update 
function  forms  as  an  integral  component  to  interfacing  with  applications,  and  we  phrase  relevant 

Key  words  and  phrases.  Extreme  values,  multivariate  regular  variation,  Markov  chain,  transition  kernel,  tail  chain, 
heavy  tails. 

S.  I.  Resnick  and  D.  Zeber  were  partially  supported  by  ARO  Contract  W911NF-10- 1-0289  and  NSA  Grant  H98230- 
11-1-0193  at  Cornell  University. 
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assumptions  in  terms  of  both.  While  not  making  explicit  a  complete  tail  chain  model  for  the  class 
of  chains  excluded  previously,  we  demonstrate  the  extent  to  which  previous  models  may  be  viewed 
as  a  partial  approximation  within  our  framework.  This  is  accomplished  by  formalizing  the  division 
between  extreme  and  non-extreme  states  as  a  level  we  term  the  extremal  boundary.  We  show 
that,  in  general,  the  tail  chain  approximates  the  extremal  component,  the  portion  of  the  original 
chain  having  yet  to  cross  below  this  boundary.  Phrased  in  these  terms,  the  regularity  condition 
requires  that  the  distinction  between  the  original  chain  and  its  extremal  component  disappears 
asymptotically. 

After  introducing  our  extreme  value  theory  for  transition  kernels,  along  with  a  representation  in 
terms  of  update  functions,  we  derive  limits  of  finite-dimensional  distributions  conditional  on  the 
initial  state,  as  it  becomes  extreme.  We  then  examine  the  effect  of  the  regularity  condition  on  these 
results.  Finally,  adding  the  assumption  of  marginal  regularly  varying  tails  leads  to  convergence 
results  for  the  unconditional  distributions  akin  to  regular  variation. 


1.1.  Notation  and  Conventions.  We  review  notation  and  relevant  concepts.  If  not  explicitly 
specified,  assume  that  any  space  §  under  discussion  is  a  topological  space  paired  with  its  Borel 
(j-field  of  open  sets  £>(§)  to  form  a  measurable  space.  Denote  by  /C(§)  the  collection  of  its  compact 
sets;  by  C(§)  the  space  of  real- valued  continuous,  bounded  functions  on  S;  and  by  C^(§)  the  space 
of  non-negative  continuous  functions  with  compact  support.  Weak  convergence  of  probability 
measures  is  represented  by  =4>. 

For  a  space  E  which  is  locally  compact  with  countable  base  (for  example,  a  subset  of  [—00,  00] d), 
M+(E)  is  the  space  of  non- negative  Radon  measures  on  £>( E);  point  measures  consisting  of  single 
point  masses  at  x  will  be  written  as  ex(-).  A  sequence  of  measures  {p n }  C  M+(E)  converges 
vaguely  to  p  G  M+(E)  (written  pn  A  p)  if  JEfdpn  — >  fE  f  dp  as  n  — >  00  for  any  /  G  C^(E). 
The  shorthand  p(f)  =  J  f  dp  is  handy.  That  the  distribution  of  a  random  vector  X  is  regularly 
varying  on  a  cone  E  C  [— 00, oo]d\{0}  means  that  tP[X/b(t)  G  •]  A  p*(-)  in  M+(E)  as  t  -»  00  for 
some  non-degenerate  limit  measure  p*  G  M+(E)  and  scaling  function  6(f )  — >  00.  The  limit  p*  is 
necessarily  homogeneous  in  the  sense  that  p*(c-)  =  c~ap*(-)  for  some  a  >  0.  The  regular  variation 
is  standard  if  6(f)  =  t. 

If  X  =  (Xo,  X 1 ,  X2 , . . . )  is  a  (homogeneous)  Markov  chain  and  K  is  a  Markov  transition  kernel, 
we  write  X  ~  K  to  mean  that  the  dependence  structure  of  X  is  specified  by  K,  i.e. 


P[An+i  G  •  |  Xn  =  x\  =  K{x  ,  ■) ,  n  =  0, 1, . . .  . 


We  adopt  the  standard  shorthand  Pa,[(A'i, . . . ,  Xm)  G  •  ]  =  P[(Ai, . . . ,  Xm)  G  •  |  Xq 
useful  technical  results  are  assembled  in  Section  [8]  (p.  21). 


x\ .  Some 


2.  Extremal  Theory  for  Markov  Kernels 


We  begin  by  focusing  on  the  Markov  transition  kernels  rather  than  the  stochastic  processes  they 
determine,  and  introduce  a  class  of  kernels  we  term  “tail  kernels,”  which  we  will  view  as  scaling 
limits  of  certain  kernels.  Antecedents  include  Segers’  [23]  definition  of  “back-and-forth  tail  chains” 
that  approximate  certain  Markov  chains  started  from  an  extreme  value. 

For  a  Markov  chain  X  ~  K  on  [0,  00),  it  is  reasonable  to  expect  that  extremal  behaviour  of  X 
is  determined  by  pairs  (Xn,  An+i ),  and  one  way  to  control  such  pairs  is  to  assume  that  (An,  Xn+\) 
belongs  to  a  bivariate  domain  of  attraction  (cf.  J5[  |23j )  •  In  the  context  of  regular  variation,  writing 


(2.1) 


t  P 


An 

b(t) 


G  An 


Xn+ 1 

6(f) 


e  A\ 


=  I  K(b(t)u,  6(f) Ai)  f  P 

J  Aq 


A^ 

Mt) 


G  du 
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suggests  combining  marginal  regular  variation  of  Xn  with  a  scaling  kernel  limit  to  derive  extremal 
properties  of  the  finite-dimensional  distributions  (fdds)  [18l  Fl9l  123] .  and  this  is  the  direction  we 
take.  We  first  discuss  the  kernel  scaling  operation. 

For  simplicity,  we  assume  the  state  space  of  the  Markov  chain  is  [0,oo),  although  with  suitable 
modifications,  it  is  relatively  straightforward  to  extend  the  results  to  Md.  Henceforth  G  and  H  will 
denote  probability  distributions  on  [0,oo). 


2.1.  Tail  Kernels.  The  tail  kernel  associated  with  G ,  with  boundary  distribution  H,  is 


(2.2) 


K*{y,A) 


G(y  1A)  y  >  0 
H{A)  y  =  0 


for  any  measurable  set  A.  Thus,  the  class  of  tail  kernels  on  [0,  oo)  is  parameterized  by  the  pair  of 
probability  distributions  ( G,H ).  Such  kernels  are  characterized  by  a  scaling  property: 


Proposition  2.1.  A  Markov  transition  kernel  K  is  a  tail  kernel  associated  with  some  ( G,H )  if 
and  only  if  it  satisfies  the  relation 


(2.3)  K (- uy  ,  A) 

when  y  >  0  for  any  u  >  0,  in  which  case  G(- ) 
H  =  e  o. 


K(y, 
K(  1, 


-i 


A) 


•).  The  property  (2.3)  extends  to  y 


0  iff 


Proof.  If  K  is  a  tail  kernel,  (2.3)  follows  directly  from  the  definition.  Conversely,  assuming  (2.3), 
for  y  >  0  we  can  write 

K(y,A)=K(l,y~1A), 


demonstrating  that  I\  is  a  tail  kernel  associated  with  K(l .  •)  (with  boundary  distribution  H  = 
A" (0  ,  ■)).  To  verify  the  second  assertion,  fixing  u  >  0,  we  must  show  that  A(u-1-)  =  H(-)  iff  H  =  cq. 
On  the  one  hand,  we  have  eo (u_1A)  =  cq{A).  On  the  other,  A(0,oo)  =  lim,woo  H{n~l,  oo)  = 
H( l,oo),  so  H{ 0, 1]  =  0.  A  similar  argument  shows  that  H( l,oo)  =  0  as  well.  □ 


We  call  the  Markov  chain  T  ~  K*  a  tail  chain  associated  with  ( G,H ).  Such  a  chain  can  be 
represented  as 

(2-4)  Tn  =  Tn_i  +  fn  l(7’n_1=o}  for  n  =  1,2,..., 

where  £n  ~  G  and  C'n  r^j  H  are  independent  of  each  other  and  of  To.  If  H  =  eo,  then  T  becomes  a 
multiplicative  random  walk  with  step  distribution  G  and  absorbing  barrier  at  {0}:  Tn  =  Tq  •  •  •  £ n . 


2.2.  Convergence  to  Tail  Kernels.  The  tail  chain  approximates  the  behaviour  of  a  Markov 
chain  X  ~  K  in  extreme  states.  Asymptotic  results  require  that  the  normalized  distribution  of  X\ 
be  well- approximated  by  some  distribution  G  when  Xq  is  large,  and  we  interpret  this  requirement 
as  a  domain  of  attraction  condition  for  kernels. 


Definition.  A  Markov  transition  kernel  K  :  [0,  oo)  x  B[ 0,  oo)  — >  [0, 1]  is  in  the  domain  of  attraction 
of  G,  written  K  E  D(G),  if  as  t  — >  oo, 

(2.5)  K (t ,  t  •)  ^  G(-)  on  [0,  oo]. 


Note  that  D(G)  contains  at  least  the  class  of  tail  kernels  associated  with  G  (i.e.  with  any  boundary 
distribution  H).  A  simple  scaling  argument  extends  (|2.5[)  to 

(2.6)  K(tu,  t-)  =>  G(u-1-)  =:  K*(u,  •),  u  >  0, 


where  K*  is  any  tail  kernel  associated  with  G;  this  is  the  form  appearing  in  (2.1 ).  Thus  tail  kernels 
are  scaling  limits  for  kernels  in  a  domain  of  attraction.  In  fact,  tail  kernels  are  the  only  possible 
limits: 
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Proposition  2.2.  Let  K  be  a  transition  kernel  and  H  be  an  arbitrary  distribution  on  [0,oo).  If 
for  each  u  >  0  there  exists  a  distribution  Gu  such  that  Kftu,  t  ■  )  =>-  Guf)  as  t  -A  oo,  then  the 
function  K  defined  on  [0,  oo)  x  £>[0,oo)  as 


K(u ,  A)  :  = 


f  GU(A) 
\H(A) 


u  >  0 
u  =  0 


is  a  tail  kernel  associated  with  G 


i- 


Proof.  It  suffices  to  show  that  Guf)  =  G\(u  1-)  for  any  u  >  0.  But  this  follows  directly  from  the 
uniqueness  of  weak  limits,  since  (|2.6[)  shows  that  Kftu ,  t-)  =>  Gi(u-1-).  □ 


A  version  of  (2.6)  uniform  in  u  is  needed  for  fdd  convergence  results. 


Proposition  2.3.  Suppose  K  E  D(G),  and  K*  is  a  tail  kernel  associated  with  G.  Then,  for  any 
u  >  0  and  any  non-negative  function  ut  =  uft)  such  that  ut  — >  u  as  t  — >•  oo,  we  have 

(2.7)  K(tut ,  t  ■)  =>  K*  (u ,  ■),  (t oo). 

Proof.  Suppose  ut  — >  u  >  0.  Observe  that  Kftut ,  t  •)  =  Kftut ,  ftuf)  uf1-),  and  put  ht(x)  =  utx, 
h(x)  =  ux.  Writing  Pt(-)  =  K(tut ,  tut  ■),  we  have 

K{tut ,  t  ■ )  =  Pt  o  hfl  =>  G  o  h~x  =  G{u~1-)  =  K*  (u  ,  • ) 

by  [2l  Theorem  5.5,  p.  34],  □ 

The  measure  G  controls  X  upon  leaving  an  extreme  state,  and  H  describes  the  possibility  of 


jumping  from  a  non-extreme  state  to  an  extreme  one.  The  traditional  assumption  (2.5)  provides 


no  information  about  H,  and  in  fact  (2.7)  may  fail  if  u  =  0 — see  Example  6.2  However,  the  choice 


of  H  cannot  be  ignored  if  0  is  an  accessible  point  of  the  state  space,  especially  for  cases  where 
G({0})  =  K*(y ,  {0})  >  0.  We  propose  pursuing  implications  of  the  traditional  assumption 
alone,  and  will  add  conditions  as  needed  to  understand  boundary  behaviour  of  X . 


Alternative,  more  general  formulations  of  (2.5)  include  replacing  K(t,  t-)  with  K(t,  aft)  ■)  or 


K(t ,  aft)  ■  +  b(t))  with  appropriate  functions  a(t)  >  0  and  bft),  in  analogy  with  the  usual  domains 
of  attraction  conditions  in  extreme  value  theory.  Indeed,  the  second  choice  coincides  with  the 
original  presentation  by  Perfekt  m,  and  relates  to  the  conditional  extreme  value  model  nmm- 
For  clarity,  and  to  maintain  ties  with  regular  variation,  we  retain  the  standard  choice  aft)  =  t, 
bft)  =  0. 


2.3.  Representation.  How  do  we  characterize  kernels  belonging  to  D(G)?  From  (2.4),  for  chains 


transitioning  according  to  a  tail  kernel,  the  next  state  is  a  random  multiple  of  the  previous  one, 
provided  the  prior  state  is  non-zero.  We  expect  that  chains  transitioning  according  to  K  E  D{G) 
behave  approximately  like  this  upon  leaving  a  large  state,  and  this  is  best  expressed  in  terms  of  a 
function  describing  how  a  new  state  depends  on  the  prior  one. 

Given  a  kernel  K,  we  can  always  find  a  sample  space  E,  a  measurable  function  :  [0, oo)  xEa 
[0,  oo)  and  an  E- valued  random  element  V  such  that  if(y,  V)  ~  Kfy  ,  • )  for  all  y.  Given  a  random 
variable  Xq,  if  we  define  the  process  X  =  (Aq,  X\,  X2, . . . )  recursively  as 

Xn-f-i  —  if(Xn,  V^-i-i),  n  >  0, 

where  {Vn}  is  an  iid  sequence  equal  in  distribution  to  V  and  independent  of  Xq,  then  A  is  a  Markov 
chain  with  transition  kernel  K.  Call  the  function  an  update  function  corresponding  to  K.  If  in 


addition  K  E  D(G),  the  domain  of  attraction  condition  (2.5)  becomes 

V)^z, 


t 
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where  £  ~  G.  Applying  the  probability  integral  transform  or  the  Skorohod  representation  theorems 
[3}  Theorem  3.2,  p.  6],  {TJ  Theorem  6.7,  p.  70],  we  get  the  following  result. 

Proposition  2.4.  If  K  is  a  transition  kernel,  K  G  D(G)  if  and  only  if  there  exists  a  measurable 
function  ip*  :  [0,  oo)  X  [0, 1]  -A  [0,  oo)  and  a  random  variable  £*  ~  G  on  the  uniform  probability 
space  ([0, 1],  B,  A)  such  that 

(2.8)  ry(t,u)->f(u)  VuG  [0,1] 

as  t  -A  oo,  and  if* is  an  update  function  corresponding  to  K  in  the  sense  that 

X[r(y,-)€A]=K(y,A) 

for  measurable  sets  A. 


Think  of  the  update  function  as  ip*(y,  U)  where  U(u)  =  u  is  a  uniform  random  variable  on  [0, 1]. 


Proof.  If  there  exist  such  ip*  and  £*  satisfying  (2.8)  then  clearly  K  G  D(G).  Conversely,  suppose 
'0(- ,  V)  is  an  update  function  corresponding  to  K.  According  to  Skorohod’s  representation  theorem 
(cf.  Billingsley  [1]  p.  70,  with  the  necessary  modifications  to  allow  for  an  uncountable  index  set), 
there  exists  a  random  variable  £*  and  a  stochastic  process  {Yj*  t  >  0}  defined  on  the  uniform 
probability  space  ([0, 1],B,X),  taking  values  in  [0,  oo),  such  that 

y0*  =  ip(0,V),  for  f  >  0, 

and  Yt*  (u)  -A  f*(u)  as  t  — >  oo  for  every  u  G  [0, 1].  Now,  define  ip*  :  [0,  oo)  X  [0, 1]  -A  [0,  oo)  as 

ip*  (0,u)  =  Yf  (u)  and  ip*(t,u)  =  tYf(u) ,  t  >  0,  Vu  G  [0.1]. 

It  is  evident  that  A [ip*(y,  •)  G  A]  =  P [ip(y,  V)  G  A]  for  y  G  [0,  oo),  so  ip*  is  indeed  an  update  function 
corresponding  to  K,  and  ip*  satisfies  (2.8)  by  construction.  □ 

Update  functions  corresponding  to  K  are  not  unique,  and  some  of  them  may  fail  to  converge 
pointwise  as  in  (2.8).  However  (2.8)  is  convenient,  and  Proposition  2.4  shows  that  Segers’  |23,J 


Condition  2.2  in  terms  of  update  functions  is  equivalent  to  our  weak  convergence  formulation 
K  G  D(G). 

Pointwise  convergence  in  (2.8)  gives  an  intuitive  representation  of  kernels  in  a  domain  of  attrac¬ 
tion. 


Corollary  2.1.  K  G  D{G)  iff  there  exists  a  random  variable  £  ~  G  defined  on  the  uniform 
probability  space,  and  a  measurable  function  (p  :  [0,  oo)  x  [0, 1]  -A  (— oo,  oo)  satisfying  t~l<p(t,  u)  — >  0 
for  all  u  G  [0, 1]  such  that 

(2.9)  ip(y,u)  :=f(u)y  +  (p(y,u) 
is  an  update  function  corresponding  to  K . 

Proof.  If  such  £  and  (p  exist,  then  t  1ip(t,u)  =  £(u)  +  t  1(p(t,u)  -A  £(u)  for  all  u,  so  ip  satisfies 
(2.8).  The  converse  follows  from  (2.8).  □ 

Many  Markov  chains  such  as  ARCH,  GARCH  and  autoregressive  processes  are  specified  by 
structured  recursions  that  allow  quick  recognition  of  update  functions  corresponding  to  kernels  in 
a  domain  of  attraction.  A  common  example  is  the  update  function  ip(y ,  (Z,  W))  =  Zy  +  W,  which 
behaves  like  ip'(y ,  Z)  =  Zy  when  y  is  large — compare  ip1  to  the  form  (2.4)  discussed  for  tail  kernels. 
In  general,  if  K  has  an  update  function  ip  of  the  form 

(2.10)  iP(y,(Z,W))  =  Zy  +  cP(y,W) 

for  a  random  variable  Z  >  0  and  a  random  element  W,  where  t~lcp{t,w)  -A  0  whenever  w  G  C 
for  which  P[W  G  C\  =  1,  then  K  G  D(G)  with  G  =  P[Z  G  ■].  We  will  refer  to  update  functions 
satisfying  (2.10)  as  being  in  canonical  form. 
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3.  Finite-Dimensional  Convergence  and  the  Extremal  Component 


Given  a  Markov  chain  X  ~  K  £  D(G),  we  show  that  the  finite-dimensional  distributions  (fdds) 
of  X,  started  from  an  extreme  state,  converge  to  those  of  the  tail  chain  T  defined  in  (2.4).  We 


initially  develop  results  that  depend  only  on  G  (but  not  H),  and  then  clarify  what  behaviour  of  X 
is  controlled  by  G  and  H  respectively.  We  make  explicit  links  with  prior  work  that  did  not  consider 
the  notion  of  boundary  distribution. 

If  G({0})  =  0,  the  choice  of  H  is  inconsequential,  since  P  \T  eventually  hits  {0}]  =  0  and  T  is 
indistinguishable  from  the  multiplicative  random  walk  {T*  =  To£i  ■  ■  ■  £n,n  >  0}  (where  To  >  0 
and  {£„}  are  iid  ~  G  and  independent  of  To).  In  this  case,  assume  without  loss  of  generality 
that  H  =  eo-  However,  if  G({0})  >  0,  any  result  not  depending  on  H  must  be  restricted  to 
fdds  conditional  on  the  tail  chain  not  having  yet  hit  {0}.  For  example,  consider  the  trajectory  of 
(X\ , . . . ,  Xm ),  started  from  Xq  =  t,  through  the  region  ( t ,  oo)m_2  x  [0,  <5]  X  (t,  oo),  where  t  is  a  high 
level.  The  tail  chain  would  model  this  as  a  path  through  (0,  oo)m_2  x  {0}  x  (0,  oo),  which  requires 
specifying  H  to  control  transitions  away  from  {0}. 

This  raises  the  question  of  how  to  interpret  the  first  hitting  time  of  {0}  for  T  in  terms  of  the 
original  Markov  chain  X .  Such  hitting  times  are  important  in  the  study  of  Markov  chain  point 
process  models  of  exceedance  clusters  based  on  the  tail  chain.  Intuitively,  a  transition  to  {0}  by  T 
represents  a  transition  from  an  extreme  state  to  a  non-extreme  state  by  X.  We  make  this  notion 


precise  in  Section  3.2  by  viewing  such  transitions  as  downcrossings  of  a  certain  level  we  term  the 
“extremal  boundary.” 

We  assume  X  is  a  Markov  chain  on  [0,  oo)  with  transition  kernel  K  £  D(G),  K*  is  a  tail  kernel 
associated  with  G  with  unspecified  boundary  distribution  H,  and  T  is  a  Markov  chain  on  [0,  oo) 
with  kernel  I\* .  The  finite-dimensional  distributions  of  X,  conditional  on  Xq  =  y,  are  given  by 

••  ■  >  Xm)  £  —  A  (r/ ,  dc zq)A  (x  ,  dxf)  •  •  •  K (xm—\  ,  dxirf), 

and  analogously  for  T. 


3.1.  FDDs  Conditional  on  the  Intial  State.  Define  the  conditional  distributions 


(3.1)  ir$(u,-)  =  P 


tu 


Xi 

t  ' 


X„ 


and  Trm(u,  •)  =  Pu[(Ti,.. .  ,Tm)  £  •],  m  >  1, 


on  [0,  oo)  X  £>[ 0,  oo]m.  We  consider  when  7 r™  =X  irm  on  [0,  oo]m  pointwise  in  u.  If  G({0})  =  0,  this  is 


a  direct  consequence  of  the  domain  of  attraction  condition  (2.5),  but  if  G({0})  >  0,  more  thought 
is  required.  We  begin  by  restricting  the  convergence  to  the  smaller  space  E'm  :=  (0,  oo]m_1  X  [0,  oo]. 
Relatively  compact  sets  in  K'm  are  contained  in  rectangles  [a,  oo]  x  [0,oo],  where  a  £  (0,oo)m_1. 


Theorem  3.1.  Let  ut  =  u(t )  be  a  non-negative  function  such  that  ut  -»  u  >  0  as  t  — >•  oo. 

(a)  The  restrictions  to 

(3.2)  Mm  (M|-):=lrm(«rn  Em)  and  Tm  {u  ,  •  )  :=  TTm  (u  ,  •  Fl  E'm)  , 
satisfy 

(3.3)  Tm  {ut )  ’)  Tm{u,  •)  in  M+(E(J  (t -A  oo). 

(b)  If  G({0})  =  0,  we  have 

(3.4)  7r$  (ut ,  •)  Trm{u,  •)  on  [ 0,oo]m  (i  -t  oo). 


Proof.  The  Markov  structure  suggests  an  induction  argument  facilitated  by  Lemma  |8.2|  (p.  [2l|). 
Consider  (a)  first.  If  rri  =  1,  then  (3.3)  above  reduces  to  (2.7).  Assume  m  >  2,  and  let  /  £ 
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Writing  =  (0,oo]  x  Ejn_1,  we  can  find  a  >  0  and  B  E  /C(E/m_1)  such  that  /  is  supported  on 
[a,  oo]  x  B.  Now,  observe  that 

Mm  (wt  J  •)  (/)  =  /  A ,  idxi)  /  A (tai ,  tdx2)  •  •  •  A" ,  tdxm)  f{xm) 

J(0,00]  JE'm-l 

=  /  K(tut,tdx i)  /  ,  d(x2,...,xm))  f(xm). 

J(0,oo] 


Defining 


ht(v)  =  I  and  h(y)  =  f  /xm-i  (v  ,  etem-i)  f(v,  xm-i) 


/E'  ,  JE'  , 

m  —  1  m  — 1 


the  previous  expression  becomes 

•)(/)  =  f  K (tut ,  tcfo) 

J (0,oo] 

Now,  suppose  vt  — >  v  >  0 :  we  verify 

(3.5)  ht(yt )  — >  /i(u). 

By  continuity,  we  have  f(vt,xtm_1 )  — >•  f(v,xm-i )  whenever  cc(n_1  — >•  ccm_i,  and  the  induction 
hypothesis  provides  ,  •)  — -»  ,  •).  Also,  f(x,  •)  has  compact  support  i?  (without  loss 


of  generality,  /rm_i (u ,  =  0).  Combining  these  facts,  (3.5)  follows  from  Lemma  8.2  (b).  Next, 

since  the  ht  and  /i  have  common  compact  support  [a,  oo],  and  recalling  from  Propostion  2.3  that 
K(tut ,  t  •)  =>  A'* (?i  ,  •),  Lemma |8.2|  (a)  yields 


•)(/) 


(0,oo] 


/t*(u,  du)  /i(u)  =  Hm(u,  •)(/). 


Implication  (b)  follows  from  essentially  the  same  argument.  For  m  >  2,  suppose  /  E  C[0,oo] 
Replacing  /i  by  7r  and  E(n_1  by  [0,oo]m_1  in  the  definitions  of  ht  and  h ,  we  have 


nin  (■ ut ,  •)  (/)  =  [  K ( tut ,  tdv )  /it(u). 

J  [0,oo] 


This  time  Lemma  8.2  (a)  shows  that  ht(vt ) 
resorting  to  Lemma  |8.2|  (a)  once  more  yields 


h(y)  if  vt  — >  v  >  0,  and  since  K*(u,  (0,  oo])  =  1, 

■)(/) — »  /  A* (u,  du)  /i(u)  =  vrm(n,  •)(/). 

J  f0,ool 


□ 


If  G({0})  >  0,  then  K*(u ,  (0,  oo])  =  1  —  G({0})  <  1,  and  for  (3.4)  to  hold  would  require  knowing 
the  behaviour  of  ht(vt )  when  vt  — >  0  as  well.  Behaviour  near  zero  is  controlled  by  an  asymptotic 
condition  related  to  the  boundary  distribution  H .  Previous  work  handled  this  using  the  regularity 
condition  discussed  in  Section  [H 

3.2.  The  Extremal  Boundary.  The  normalization  employed  in  the  domain  of  attraction  condi¬ 


tion  (2.5)  suggests  that,  starting  from  a  large  state  t,  the  extreme  states  are  approximately  scalar 


multiples  of  t.  For  example,  we  would  consider  a  transition  from  t  into  (f/3,  2t]  to  remain  extreme. 
Thus,  we  think  of  states  which  can  be  made  smaller  than  td  for  any  5,  if  t  is  large  enough,  as 
non-extreme.  In  this  context,  the  set  [0,  \/t\  would  consist  of  non-extreme  states. 

Under  (2.5),  a  tail  chain  path  through  (0,  oo)  models  the  original  chain  X  travelling  among 


extreme  states,  and  all  of  the  non-extreme  states  are  compacted  into  the  state  {0}  in  the  state  space 
of  T.  Therefore,  if  X  is  started  from  an  extreme  state,  the  portion  of  the  tail  chain  depending  solely 
on  G  is  informative  up  until  the  first  time  X  crosses  down  to  a  non-extreme  state.  If  G({0})  =  0, 
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such  a  transition  would  become  more  and  more  unlikely  as  the  initial  state  increases  in  which  case 
G  provides  a  complete  description  of  the  behaviour  of  X  in  any  finite  number  of  steps  following  a 
visit  to  an  extreme  state  (Theorem  3.1  (b)). 

Drawing  upon  this  interpretation,  we  develop  a  rigorous  formulation  of  the  distinction  between 
extreme  and  non-extreme  states,  and  we  recast  Theorem  3.1  as  convergence  on  the  unrestricted 
space  [0,  oo]m  of  the  conditional  fdds,  given  that  X  has  not  yet  reached  a  non-extreme  state. 

Definition.  Suppose  K  G  D{G).  An  extremal  boundary  for  K  is  a  non- negative  function  y(t) 
defined  on  [0,  oo),  satisfying  lim^oo  y{t)  =  0  and 

(3.6)  K (t ,  t  [0,  y(t)]) — >  G({0})  as  t  — >  oo. 

Such  a  function  is  guaranteed  to  exist  by  Lemma  [875]  (p.  [23]) . 

If  G({0})  =  0,  then  y(t)  =  0  is  a  trivial  choice.  For  any  function  0  <  y(t)  — >  0,  we  have 
lim sup^oQ  K(t ,  f  [0,  y(t)])  <  G({0}),  so  (|3.6[)  is  equivalent  to 

(3.7)  liminf  Kit ,  t  [0,  y(t)])  >  G'({0}). 

t— »•  oo  v  ' 


If  y(t)  is  an  extremal  boundary,  it  follows  that  any  function  0  <  y(t)  — >  0  with  y(t)  >  y(t)  for  t  >  to 
is  also  an  extremal  boundary  for  K.  Taking  y(t)  =  \/s>t  y{s)  shows  that  without  loss  of  generality, 
we  can  assume  y(t)  to  be  non-increasing. 

The  extremal  boundary  has  a  natural  formulation  in  terms  of  the  update  function.  As  in  (2.10), 
let  r/>(y,  ( Z ,  IT))  =  Zy  +  <f)(y,  W)  be  an  update  function  in  canonical  form,  where  y  is  extreme.  If 
Z  >  0  then  the  next  state  is  approximately  Zy ,  another  extreme  state.  Otherwise,  if  Z  =  0,  the 
next  state  is  <f(y,W),  and  a  transition  from  an  extreme  to  a  non-extreme  state  has  taken  place. 
This  suggests  choosing  an  extremal  boundary  whose  order  is  between  t  and  (j>(t,w). 


Proposition  3.1.  Suppose  ip(y,(Z,W))  is  an  update  function  in  canonical  form  as  in  (2.10).  If 
C(t)  >  0  is  a  function  on  [0,  oo)  such  that 

(3.8)  (j)(t,w)/C(t) — >-0 

as  t  — >  oo  whenever  w  E  B  for  which  P[W  G  B]  =  l,  then  lim  inf^oo  K(t ,  [0,  £(£)])  >  G({0}). 
Provided  lirn^oo  £(t)/i  =  0,  an  extremal  boundary  is  given  by  y(t)  :=  CWA- 

Thus  if  (j>(t,w)  =  o(((t))  and  ((t.)  =  o(t)  then  £ (t)/t  is  an  extremal  boundary.  For  example,  if 
il>(y,  (Z,  IF))  =  Zy  +  IF,  so  that  f>(t,  w)  =  w,  then  choosing  £(t)  to  be  any  function  ((t)  — >  oo  such 
that  f(t)  =  oft)  makes  ({t)/t  an  extremal  boundary.  Choosing  f(t)  =  \/t.  we  find  that  y(t)  =  1/^/1 
is  an  extremal  boundary. 

Proof.  Since 

p[m  <  at)  ,z  =  o]  =  p  [at,  w)  <  at) ,  z  =  o]  >  p[i^,  w)\  <  at)  ,z  =  o] 

’m,w)  i 


>  P[Z  =  0]  -  P 


C  (t) 


>  l 


p  [z  =  o] , 


we  have 


liminf  K(t ,  [0, C (t)] )  =  liminf  P \if{t)  <  ^(t)]  >  P\Z  =  0] . 


□ 


t— >  OO 


We  will  need  an  extremal  boundary  for  which  (3.6)  still  holds  upon  replacing  the  initial  state  t 
with  tut,  where  ut  — >  u  >  0.  Compare  the  following  extension  with  Proposition  2.3 


Proposition  3.2.  If  K  G  D{G),  then  there  exists  an  extremal  boundary  y*(t)  such  that 
(3.9)  K(tut,t[0,y*(t)])^G({0}) 

for  any  non-negative  function  ut  =  u(t)  — >  u  >  0. 


as  t  — >  oo 
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We  will  refer  to  y*  as  a  uniform  extremal  boundary. 


Proof.  Let  y(t)  be  an  extremal  boundary  for  K.  As  a  first  step,  fix  no  >  1,  and  suppose  n0  < 
u  <  no-  Define  y(t)  =  uoy(tu q1).  Now,  if  ut  — >  u,  then  yiu\(t)  :=  uty(tut )  satisfies  (3.9),  since 

K(tut ,  t  [0,?/{u}(f)])  =  K(tut ,  tut  [0 ,y(tut)})  — »  G({0}). 

Here  y^  depends  on  the  choice  of  function  ut-  However,  since  we  eventually  have  u^1  <  ut  <  uq 
for  t  large  enough,  it  follows  that  y(t)  >  y{uy(t)  for  such  t.  Hence,  y(t)  satisfies  (3.9)  for  any  ut  u 
with  Uq  1  <  u  <  no- 

Next,  we  remove  the  restriction  in  no  via  a  diagonalization  argument.  For  k  =  2, 3, ... ,  let  yk(f) 
be  extremal  boundaries  such  that  K(tut ,  t  [0 ,yk(t)])  —t  G({0})  whenever  ut  —t  u  for  n  G  (A-1,  k), 
and  put  yo  =  yi  =  y.  Next,  define  the  sequence  {(sk,Xk)  :  k  =  0, 1,.. . }  inductively  as  follows. 
Setting  so  =  0  and  xo  =  yo(f),  choose  s k  >  Sk-\  + 1  such  that  yj(t)  <  A;-1  Ain  for  all  j  =  0 , ,k 
whenever  t  >  Sk,  and  put  Xk  =  max{j/j(sfc)  :  j  =  0, . . . ,  k}.  Note  that  Xk  <  k~l  A  Xk-i,  so  Xk  i  0, 
and  Sk  t  oo.  Finally,  set 


y*(t)  =  ^xk  l 


[sfc ;  Sfc+1 


)(*)• 


fc=0 


Observe  that  0  <  y*(t)  |  0,  and  suppose  ut  ^  u  >  0.  Then  n  G  for  some  fco,  so 

K(tut,  t[0,yko(t)})  —t  G({0}),  and  for  k  >  ko,  our  construction  ensures  that  whenever  Sk  <  t  < 
fifc+i,  we  have  yk0(t)  <  yk0{sk)  <  xk  =  y*(t).  Therefore,  y*(t)  >  yko(t )  for  t  >  sko,  so  y*  satisfies 

□ 


Henceforth,  we  assume  any  K  G  D(G )  is  accompanied  by  a  uniform  extremal  boundary  denoted 
by  y(t),  and  we  consider  extreme  states  on  the  order  of  t  to  be  (ty(t),  oo].  If  G({0})  =  0,  then 
all  positive  states  are  extreme  states.  We  now  use  the  extremal  boundary  to  reformulate  the 
convergence  of  Theorem  3.1  on  the  larger  space  [0,oo]m.  Put  E 'm(t)  =  (y( f),oo]m_1  X  [0,oo],  so 
that  E 'm(t)  t  E 'm  =  (0,  oo]m_1  x  [0,  oo].  Recall  the  notation  /im  and  from  (3.1 ),  (3.2 )  in  Theorem 
3T](p.§. 


Theorem  3.2.  Let  ut  =  u(t )  be  a  non-negative  function  such  that  ut  u  >  0  as  t  — >  oo.  Taking 

(“»  ')  =  fm(«.  •  nEmW)> 

we  have 

■)  hm{u,  •)  in  M_|_[0, oo]m  (t  -A  oo). 

Proof.  Note  that  we  can  just  as  well  write  ym(u  ,  •)  =  yin  {u  ,  •  D  Wm(t)).  Suppose  m  >  2  and  let 
/  G  C£[0,  oo]m.  For  6  >  0,  define  As  =  ( 5 ,  oo]m_1  X  [0,  oo],  and  choose  6  such  that  ym(u ,  dA$)  =  0. 
On  the  one  hand,  for  large  t  we  have 


/4n  {ut ,  •)  (/)  =  /  /(*)  (*)  («t ,  d*)  > 

J  [0,oo]m 


rE' 


f(x)lAs(x)  n$(ut,  dx ) 


/  /(®)  lAj  (®)  Mm  («,  dec) 

-'EL 


as  £  — ^  oo  by  Lemma  8.3  (p.  22).  Letting  5  ^  0  yields 

(3.10)  lim  inf  7/W 


iim  hrf  y!$(ut,  •)(/)  >  Tm(u ,  •)(/) 
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by  monotone  convergence.  On  the  other  hand,  fixing  <5,  we  can  decompose  the  space  according  to 
the  first  downcrossing  of  <5: 

m— 1  „ 

(3.11)  •)(/)=  /  ,  f(x)lAs(x)  dx)  +  L  ,  dx), 


'  [0,oo]r 


t=1  J[0,oo]” 


where  Ak  =  (5,  oo] ^  1  x  [0,(5]  x  [0,oo]m  k.  On  the  subsets  Ag  we  appeal  to  the  bound  on  /,  say 
M,  to  obtain 


[0,oo]' 


f(x)  1  Ak(x)  Ii$  ( Ut ,  dx)  <  M  n®  ( Ut ,  Ag). 


Now, 

(3.12)  ( ut ,  Ag)  <  yf  ( ut ,  (5,  oo]fc_1  x  (y(t),6}) 

=  yk){ut,  (<5,  oo]fc_1  x  [0,(5])  -  yf{uu  (c5,  oo]fc_1  x  [0,y(t)]). 
Considering  the  second  term,  we  have 
dk\ut,  (<5,  oo]fe_1  x  [0,y(t)]) 

=  /  K(tut,tdxi)l(gi00](xi)---  K (txk-2  ,  tdxk-i)l(g>00](xk-i)  K (txk-i ,  t  [0,y(f)]) 

J  [0,oo]  J  [0,oo] 

/4- !.(“*>  dxk- 1)  ht(xk-i), 


IE' 


where 


ht(xk-i)  =  K{txk- 1,  t[0,y(t)])  l(g!00]k-i(xk-i). 

Moreover,  if  ®^._1  — >  xk-i  E  (6,  oo]fc_1,  then 

ht{x\_ i)  =  Kfel-i  ,t[0,y(t)])  l(i)0o]*-i(*fc_i)  — >  G({0})  l(*)0O]*-i(*fc-i)> 

using  the  fact  that  y(t)  is  a  uniform  extremal  boundary.  Since  yk- i(u,  <9(<5,  oo]*"'”1)  =  0  without 
loss  of  generality  by  choice  of  5,  we  conclude  that 

Ak\ut  >  (<5,°o]fc-1  x  [0,y(i)])  — »  G({0})  •  yk-i{u,  (<5, oo]^1)  =  (<5,oo]fc_1  x  {0}) 


as  i  — *  oo.  Now,  let  us  return  to  (3.12).  Given  any  e  >  0,  by  choosing  <5  small  enough,  we  can  make 
yk\ut,  (6,  oo]fc_1  X  (y(t),S])  — ■>  yk{u,  (5,oo]fc_1  x  [0,(5])  -yk(u,  (<5,oo]fc_1  x  {0}) 
<lik{u,  (0,oc]fe_1  x  [0,5])  -  Hk{u,  (5,  oof-1  x  {0}) 

<  yk{u,  (0,oo]fe_1  x  {0})  +  |  -  (yk{u,  (0,oc]fc_1  x  {0})  -  0  =  e, 


i.e. 

(3.13) 


limsup  ,  Ag)  < 


t—¥  OO 


for  k  =  1, . . . ,  m  —  1.  Therefore,  (3.11 )  implies  that,  given  e'  >  0, 


,,  m— 1 

lirn  sup  iut  ,  • )  (/)  <  /  /(*)  (*)  /im  (« ,  dx)  +  M  V  lim  sup  (ut ,  Af) 

t—>00  J  [0,Oo]m  7, 1  t—>oo 


'  [0,oo 

<  Hm{u,  •)(/)  +  e' 


for  small  enough  5.  Combining  this  with  (3.10)  yields  the  result. 


□ 
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3.3.  The  Extremal  Component.  Having  thus  formalized  the  distinction  between  extreme  and 
non-extreme  states,  we  return  to  the  question  of  phrasing  a  fdd  limit  result  for  X  when  H  is 
unspecified.  The  extremal  boundary  allows  us  to  interpret  the  first  hitting  time  of  {0}  by  the  tail 
chain  as  approximating  the  time  of  the  first  transition  from  extreme  down  to  non-extreme.  In  this 
terminology,  Theorem  |3.2|  provides  a  result,  given  that  such  a  transition  has  yet  to  occur. 

Define  the  first  hitting  time  of  a  non-extreme  state 

r(t)  =  inf  {n  >  0  :  Xn  <  ty(t )}  . 

For  a  Markov  chain  started  from  tut,  where  ut  — >  u  >  0,  we  have  tut  >  y(t )  for  large  t,  so  r(t)  is 
the  first  downcrossing  of  the  extremal  boundary. 

For  the  tail  chain  T,  put  r*  =  inf{n  >  0  :  Tn  =  0}.  Given  To  >  0,  write  t*  =  inf{n  >  1  :  =  0}, 

where  {£n}  ~  G  are  iid  and  independent  of  To,  i.e.  r*  follows  a  Geometric  distribution  with 
parameter  p  =  G({0}).  Thus,  P[r*  =  m]  =  p(  1  —  p)m~ 1  for  mn  >  1  if  p  >  0,  and  P[r*  =  oo]  =  1  if 
p  =  0.  Theorem  |3.2|  becomes 

(3.14)  PtUt  [i r1Xm  G  • ,  r(t)  >m]  —A  Pu  [Tm  €  ■ ,  t*  >  m\ , 
implying  that  r*  approximates  r(t): 

(3.15)  Ptut[r(i)  G  ■]  P[t*  G  •],  (t  ->■  oo,  ut  u  >  0). 

So  if  G({0})  >  0,  X  takes  an  average  of  approximately  GdO})^1  steps  to  return  to  a  non-extreme 
state,  but  if  G({0})  =  0,  PtUi  [Ti  <  rri]  0  for  any  m>  1  so  starting  from  a  larger  and  larger 
initial  state,  it  will  take  longer  and  longer  for  X  to  cross  down  to  a  non-extreme  state. 

Let  T*  be  the  tail  chain  associated  with  (G,  eo).  For  {£n}  ~  G  iid  and  independent  of  Tq  , 


(3.16) 


rri*  _ 

1  n 


We  restate  (3.14)  in  terms  of  a  process  derived  from  X,  called  the  extremal  component  of  X,  whose 
fdds  converge  weakly  to  those  of  T* .  The  extremal  component  is  the  part  of  X  whose  asymptotic 
behavior  is  controlled  by  G  alone. 


Definition.  The  extremal  component  of  X  relative  to  t  is  the  process  X^  defined  for  t  >  0  as 


X^^  Xn  •  ,  n  0, 1, ... . 


Observe  that  X^  is  a  Markov  chain  on  [0,  oo)  with  transition  kernel 


K(t)  (x ,  A) 


K(x,  An(ty(t),oo\) +e0(A)  ■  K(x,  [0 ,ty(t)])  x  >  ty(t) 
e0(T)  x  <  ty(t) 


It  follows  that  (t ,  t  •)  =>  G  as  t  ->  oo,  and  additionally  that  K^\t ,  {0})  — >  G({0}). 
The  relation  between  the  component  processes  X^\  T*  and  the  complete  ones  is 

P tut  [t~lxm  G  •  |  T(t )  >m]  =  P tut  [t^Xm  €  •  |  r(t)  >  m] 

and 

P u[T*m  G  •  |  r*  >  m\  =  Pu[Tm  G  •  |  r*  >  m]. 


Theorem  3.3.  Let  ut  =  u(t)  >  0  satisfy  ut  — >  u  >  0  as  t  — >  oo.  Then  on  [0,  oo]m, 


71 ')  :=  P tut 


PU[(TT,  ...,T^)G-]  (t^  oo). 
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Proof.  Suppose  m  >  2  and  /  £  C[0, oo]m,  and  assume  first  that  /  >  0.  Then  /  £  C^[0,oo]m  as 
well,  since  the  space  is  compact.  Recall  the  notation  of  Theorem  3.2  Conditioning  on  r(f),  we  can 
write 

,,  m  r. 

{ut,  ■)  if)  =  /  f(xm)  7 (ut  ,  dxm )  +  V  /  f{xm)  (ut  ,  dxm ) 

0,oo]m  k=l  ^ (Q’°°]k  1  x{0}m  fc+1 

r.  m  r. 

=  /  f(xm)  7 f$  (lit  ,  daJm)  +  y2  fixk,  0,  .  .  .  ,  0)  7f^  (lit  ,  dxk) 

.l(0,oo]m  ^_i  J (0,oo]fe-1x{0} 


by  the  Markov  property.  Since 

vfW  (lit ,  •  n  (0,  oo]"*)  =  P tut  [t~lxm  G  •  >  T(t )  >  m]  =  ptut  e  •  n  (y(t),  oo]m] 

=  Jl^+1(ut,  •  x  [0,  oo])  , 
the  first  term  becomes 

f^rn+lid^t  i  ")(/)  ^  Mm+1  i  ')(/)  =  I  f{xm)  (u  ,  daim)  =  I  f(xm)  Pu  [^m  ^ 

J(0,oo]m  J(0,oo]m 

as  t  — ^  oo.  Next,  for  any  A  C  [0,  oc]/,:  measurable,  write  Ao  =  {aifc-i  :  (xk-i ,  0)  £  A}  C  [0,  oo]fe_1, 
and  observe  that 

^(ut ,  An(0,oo]fc_1  X  {o})  =  Pt„t  £  Aq  n  {0,0c}1*-1 ,  x®  =  o] 

=  Ptut[t~lxk-i  <£  A0n  (y{f),oo\k~l  ,  t~lXk  <  y(t)] 

=  Jl ^  (lit ,  Aq  x  [0,  oo])  -  /4+ 1  (,ut ,  A0  X  [0,  oo]2) . 

Applying  this  reasoning  to  the  terms  in  the  summation  yields 


f(xk- 1,  0,  . . . ,  0)  yf{ut ,  dxk)  -  I  f{xk- i,  0,  . . . ,  0)  Hk]+l{ut ,  dxk+l) 

J[  0,oo]fc+1 

— >  /  f(xk-i,  0,  . . . ,  0)  nk{u,  dxk)  -  /  f(xk-i,  0,  . . . ,  0)  /tfc+i(u,  dxk+f) 

J\0,oo]k  J  [0,oolfc+1 


r(*) 


'  (0,oo]fe_1x{0} 


f(xk,  0,  . . . ,  0)  7Tfc  (n  ,  dxk )  = 


,(0,oo]fe-1x{0}m-fe+1 


/(*m)  P4Tm  £  d£Cm]. 


Combining  these  limits  shows  that  EtUtf(t~1Xm)  — >  E uf(T*m),  as  t  — >  oo.  Finally,  if  /  is  not 
non-negative,  then  write  /  =  /+  —  /_.  Since  each  of  f+  and  /_  is  non-negative,  bounded,  and 
continuous,  we  can  apply  the  above  argument  to  each.  □ 


4.  The  Regularity  Condition 


Previous  work  on  the  tail  chain  derives  fdd  convergence  of  X  to  T* 


under  a  single  assump¬ 
tion  analogous  to  our  domain  of  attraction  condition  (2.5).  As  we  observed  in  Section  3.1,  when 
G({0})  =  0,  fdd  convergence  of  {i_1X}  follows  directly,  but  when  G({0})  >  0,  it  was  common  to 


assume  an  additional  technical  condition  which  made  (2.5)  imply  fdd  convergence  to  T*  as  well. 
This  condition,  which  we  refer  to  as  the  “regularity  condition,”  is  an  asymptotic  convergence  as¬ 
sumption  prescribing  the  boundary  distribution  to  be  if  =  eo-  We  consider  equivalences  between 
different  forms  appearing  in  the  literature,  in  terms  of  both  kernels  and  update  functions,  and  show 
that,  under  the  regularity  condition,  the  extremal  behaviour  of  X  is  asymptotically  the  same  as 
that  of  its  extremal  component  X^\ 
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In  cases  where  GdO})  >  0,  Perfekt  [18 s  T9]  requires  that 
(4.1)  lim  lirnsup  sup  K(tu,  (t,  oo])  =0, 

N-O  t— >oo  ns[0,5] 

while  Segers  [23]  stipulates  that  the  chosen  update  function  corresponding  to  K  must  be  of  at  most 
linear  order  in  the  initial  state: 


(4.2) 


lirnsup  sup  t  lif(y,v)  <  oo, 

t— »•  oo  0  <y<t 


(v  E  B0 ,  P[V  E  So]  =  1). 


Smith  [T4]  used  a  variant  of  (4.1).  We  deem  a  formulation  in  terms  of  distributional  convergence 
to  be  instructive  in  our  context. 

Definition.  A  Markov  transition  kernel  K  E  D{G)  satisfies  the  regularity  condition  if 

(4.3)  K(tut  ,t-)=>  e0(-) 

on  [0,  oo]  as  t  — >  oo  for  any  non-negative  function  ut  =  u(t)  — >  0. 

Note  that  in  (|2.7|)  (p.  0.  we  had  ut 
distribution  H  to  be  cq. 


u  >  0.  We  interpret  (4.3)  as  designating  the  boundary 


We  now  consider  the  relationships  between  (4.1),  (4.2)  and  (4.3)  ,  and  propose  an  intuitive 
equivalent  for  update  functions  in  canonical  form. 

Proposition  4.1.  Suppose  K  E  D(G),  and  let  -?/;(•,  V)  be  an  update  function  corresponding  to  I\ 
such  that 

(4.4)  t  _V(i,u) — 

whenever  v  E  B  for  which  P[V  E  B]  =  1,  and  £  o  V  ~  G.  Then: 


(a)  Condition  (4.1)  is  necessary  and  sufficient  for  K  to  satisfy  the  regularity  condition  (4.3). 

(b)  Condition  (4.2)  is  sufficient  for  K  to  satisfy  the  regularity  condition  (4.3). 

(c)  If  if  is  in  canonical  form,  i.e. 

if(y,(Z,W))  =  Zy  +  cf(y,W), 


then  if  satisfies  (4.2)  if  and  only  if  <f(-,w)  is  bounded  on  any  neighbourhood  of  0  for  each 
w  E  C,  a  set  for  which  P[W  E  C\  =  1. 


Proof.  (a)  Assume  (|4.1|),  and  suppose  ut  — »  0.  We  show  I\{tut ,  t(x,  oo])  — >  0  for  any  x  >  0.  Write 

u}(t,5)=  sup  K(tu,(t,  oo]). 
ue[0,(5] 

Let  e  >  0  be  given,  and  choose  5  small  enough  that  limsup^^  o;(t,  5)  <  e/2.  Then  for  t  large 
enough  that  ut  <  Sx,  we  have 

K(tut,  t(x,  oo])  <  sup  K(tu,  t(x,  oo])  =u(tx,5 )  <  lirnsup  u(t,  5)  +  e/2 

u£[0,6x]  t-to o 

for  t  large  enough.  Our  choice  of  5  implies  that  K(tut ,  t(x,  oo])  <  e. 

Conversely,  assume  that  K  satisfies  (4.3)  but  that  (4.1)  fails.  Choose  e  >  0  and  a  sequence 
Sn  |  0  such  that  limsup^^  u(t,  6n)  >  e  for  n  =  1,  2, . . . .  Then  for  each  n  we  can  find  a  sequence 
Vf  — >  oo  as  k  — >  oo  such  that  u(t^,6n)  >  e  for  each  k.  Diagonalize  to  find  k\  <  k2  <  ■  ■  ■  such  that 
sn  =  1%  -A  oo  and  uj(sh,  Sn)  >  e  for  all  n.  Finally,  for  n  =  1,2,...  choose  un  E  [0,  <5n]  such  that 

A  ( snun  ,  (sn,  oo])  >  Loi^sn,  6n)  e/2, 

and  put  u(t)  =  un  1[s„,sn+1)(i)-  Clearly  u{t)  -A  0,  but  K(snu(sn),  (sn,oo])  >  e/2  for  all  n, 
contradicting  (4.3). 
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(b)  Write  M(y)  =  limsupj  sup0<2/<i  t  1ip(y,v).  Since 


sup  t  ip(y,  v )  =  sup 
0 <y<t  0 <y<5 


tp(td  ly,v)  ! 


t5 


-i 


for  5  >  0,  we  have 


lirnsup  sup  t  lip(ty,v)  =  5M(y). 

4— kx>  0<y<8 


Now,  suppose  ut  — >  0.  Given  any  5  >  0  we  have 


t  v)  <  sup  t  ip(ty,v) 


0<y<8 


provided  t  is  large  enough,  so  lim  supt  t  1ip(tut,  v )  <  SM (v) .  Consequently,  lim  supt  t  1ip(tut,  v)  =  0 
for  every  v  such  that  M(v)  <  oo.  Under  (|4.2[),  this  means  that  P  V)  — >  0]  =  1,  implying 


(c)  Suppose  first  that  y„,(a)  =  sup0<2/<a  <fi(y,  w)  <  oo  for  all  a  >  0,  whenever  w  €  C.  Fixing 
w  E  C  and  z  >  0,  note  that 

sup  (z,  w))  <  z  +  sup  w), 

0<y<t  0<2/<t 

and  observe  for  any  a  >  0  that 

sup  rlcj){y,w)  <  (  sup  rl(t){y,w))\J  I  sup  w) )  <  t~lXw{a)  V  (  sup  y~ 10(y,  w) ) . 

r\<o,<+  ^0 <y<a  '  ^  n<i,<+  /  \  n<^,  / 


0<2/<t 


■  a<y 


Choosing  a  large  enough  that  supa<y  y  <  1,  say,  it  follows  that 

lirnsup  sup  t~lrtp{y,{z,w))  <  z  +  1, 

4—^oo  0<y<t 

so  v  =  (z,w)  €  Bq.  Therefore  P[(Z,  W)  6  -Bo]  >  PfZ  >  0,  W  G  C]  =  1. 

Conversely,  suppose  there  is  a  set  D  with  P  [W  E  B]  >  0  such  that  w  E  D  implies  %w(a)  =  oo 
for  some  0  <  a  <  oo.  Since  sup0<J/<t  (z,  re))  >  we  have  [0,oo)  x  D  C  B§, 

contradicting  (4.2).  □ 

The  exclusion  of  necessity  from  part  (b)  results  from  the  fact  that  a  kernel  A'  does  not  uniquely 
specify  an  update  function  tp.  Even  when  K  satisfies  the  regularity  condition  (4.3),  it  may  be 
possible  to  choose  a  nasty  update  function  ip  which  satisfies  (4.4),  but  not  (4.2).  However,  in  such 
cases  there  may  exist  a  different  update  function  ip'  corresponding  to  K  which  does  satisfy  (4.2). 

Here  is  an  example  of  such  a  situation.  We  exhibit  an  update  function  ip  for  which  (i)  (|4.4[)  holds; 
(ii)  (4.2)  fails  because  condition  (c)  in  Proposition  4.1  fails;  but  yet  (iii)  the  corresponding  kernel 
satisfies  the  regularity  condition  (|4.3l).  Furthermore,  we  present  a  different  choice  of  update  function 


corresponding  to  the  same  kernel  which  satisfies  (4.2).  Define  ip(y,V  =  (Z,W))  =  Zy  +  cp(y1W ), 
where 

OO 

Vi  tv)  ^  ^  '  l{yru= 1/fc} 
fc=l 

and  W  ~  B(0, 1).  (i)  Since  <p(t,w)  =  0  for  t  >  1/w,  it  is  clear  that  ip  satisfies  (4.4)  with  £  =  Z. 
(ii)  Observe  that  for  any  w  E  (0, 1),  (p(-,w)  is  unbounded  on  the  interval  [0, 1].  Therefore,  by  part 
(c)  of  Proposition  [4T]  (4.2)  cannot  hold  for  ip.  (iii)  However,  the  corresponding  kernel  does  satisfy 
the  regularity  condition  (4.3).  Suppose  ut  — >  0  and  a  >  0  is  arbitrarily  large.  Write 

P\t~lip{tut,  (Z,  W ))  >  x]  =  P [Zut  +  t~l(p{tut,  W)  >  x]  <  P \t~lcp(tut,  W)  >  x']  +  P[Z  >  a], 
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choosing  0  <  x'  <  x  —  aut-  Since  for  any  t,  {w  :  < p(tut,w )  >  tx'}  C  {(tutk)-1  :  A:  =  1,2,...},  a 
set  of  measure  0  with  respect  to  P [W  £  •],  (4.3)  follows  by  letting  a  — >  oo.  On  the  other  hand,  the 
update  function  tp'(y,  Z )  =  Zy  does  satisfy  (|4.2[),  and  for  any  y, 

P[^(y,Z)^^y,(Z,W))]  =P[We{(yk)~1  :  k  =  1,2,...}]  =0, 
so  ip1  does  indeed  correspond  to  K. 

The  regularity  condition  (4.3)  restricts  attention  to  Markov  chains  for  which  the  probability 
of  returning  to  an  extreme  state  in  the  next  m  steps  after  falling  below  the  extremal  boundary 
is  asymptotically  negligible.  For  such  chains,  as  well  as  those  for  which  y(t)  =  0  is  an  extremal 
boundary  for  K ,  X  has  the  same  asymptotic  behaviour  as  its  extremal  component,  as  described 
next. 

Theorem  4.1.  Suppose  X  ~  K  with  K  £  D(G),  and  let  p  be  a  metric  on  Mm.  If  y(t)  =  0  is  an 
extremal  boundary  for  I\,  or  if  K  satisfies  the  regularity  condition  (4.3),  then  for  any  e  >  0  we 
have 

-(*) 


(4.5) 

Consequently, 

(4.6)  P  tut 


tut 


/  ’  \  \ 

/  w  m  -z*-  m  \ 

Ht-t) 


>  e 


0 


(f  -»  oo,  ut  — >  u  >  0). 


\(Xi 

Xm\ 

K  t  ’• 

P u  [(?1  ,  •  •  •  ,  Tfn)  £  •  ]  (t  —■ >  OO,  Ut  — >  u  >  0). 


First  let  us  extend  the  regularity  condition  to  higher-order  transition  kernels. 


Lemma  4.1.  If  K  satisfies  (4.3),  then  so  do  the  m-step  transition  kernels  Km. 

Proof.  This  is  established  by  induction.  Let  ut  — >  0  and  /  £  C[0,  oo].  For  m  >  2,  we  have 


Km{tut,  •)(/)  = 


K 


m—  1 


’  [0,oo] 


(tut  ■  tdv )  /  K(tv,  tdx )  f(x). 


[0,oo] 


Assume  that  Km  l{tut  ,£•)=>  eo;  (4.3)  implies  that  f  K(tvt ,  tdx )  f(x)  /( 0)  whenever  vt 

Therefore,  by  Lemma  8.2  (a)  (p.  21),  we  conclude  that 

Km(tut,-)(f)^f(0)=e0(f). 

Proof  of  Theorem  \  f.l\  Suppose  e  >  0  and  ut  -»  u  >  0.  Write 

m 

Pint  [p(t~1Xm  »  t~lXm )  >  e]  =  Ptu*  >  i_1^m)  >  6  ,  r(t)  =  k]  . 

k=  1 

Since  Xj  =  X^  while  j  <  r(t),  for  the  A:-th  summand  to  converge  to  0,  it  is  sufficient  that 

P tut  [\xjt]/t  ~  xj/t |  >  $  ,  r(t)  =  k]  =  P tut  [xj/t  >  $  i  T(t)  =  k]  — >  0 
for  j  =  k, . . .  ,m  and  any  d  >  0.  If  j  =  k,  we  have 

P tut  [ Xj/t  >  5  ,  r(t)  =  k]  <  P tut  [Xk/t  >  <5 ,  Xk/t  <  y(t)\  =  0 
for  large  t.  For  j  >  k,  recalling  the  notation  of  Theorem  |3.2[ 


' [0,oo] 


Ptxk  [Aj_ k  >  tdj  l[0iJ/(t)]  {xk)  pk  (fit  i  dxfj 


0. 

□ 


P tut  \Xj/t  >  6  ,  7"(f)  —  —  /  1  [o ,y(t)\i.Xk)  P tut  \Xj/t  >  d| Xk/t  —  ajfcj  P tut  \Xk/t  £  ckcfc] 

J  E'k(t) 
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using  the  Markov  property.  We  claim  that  this  intergral  — >  0  as  t  — >  oo.  If  y(t)  =  0,  this 

a  4  fj,k(u,  ■),  and  consider  ht{xk)  =  P tXk[Xj_k  > 


follows  directly.  Otherwise,  recall  that  nk  (ut ,  • 
td]  l[o ,y(t)\(xk)-  Suppose  aW  — >  x  E  [0,oo]fc.  If  xk  >  0,  then  ht(x^)  =  0  for  large  t  because  y(t) 


implies  that  P  (t>  [Xj_k  >  t5] 


0.  Otherwise,  if  xk  =  0,  we  have  ht(x — >  0  since  Lemma  4.1 
as  t  — >  oo.  Lemma  8.2  (b)  establishes  (|4.5[);  (4.6)  follows  by  Slutsky’s  theorem. 


►  0 
□ 


Therefore,  X  converges  to  T*  in  fdds  under  (a)  G({0})  =  0,  (b)  G({0})  >  0  combined  with  (4.3), 
or  (c)  G({0})  >  0  combined  with  the  extremal  boundary  y[t)  =  0.  In  either  case,  we  will  be  able 
to  replace  the  extremal  component  X'L>  with  the  complete  chain  X  in  the  results  of  Sections  5.1 


and  5.2  However,  that  y(t)  =  0  is  an  extremal  boundary,  and  consequently  that  (4.6)  holds,  does 
not  imply  the  regularity  condition  to  hold,  regardless  of  G({0});  in  particular,  a  kernel  for  which 


G({0})  =  0  need  not  satisfy  (4.3).  This  is  illustrated  in  Example  6.3 


5.  Convergence  of  the  Unconditional  FDDs 

5.1.  Effect  of  a  Regularly  Varying  Initial  Distribution.  So  far  our  convergence  results  re¬ 
quired  that  the  initial  state  become  large,  and  the  only  distributional  assumption  was  that  the 
transition  kernel  K  determining  X  be  attracted  to  some  distribution  G.  To  obtain  a  result  for  the 
unconditional  distribution  of  (Vo, . . .  ,Xm),  we  require  an  additional  assumption  about  how  likely 
the  initial  observation  Vo  is  to  be  large.  Using  Lemma  |8.4[,  the  results  of  the  previous  sections 
extend  to  multivariate  regular  variation  on  the  cone  Em  =  (0,  oo]  X  [0,oo]m  when  the  distribution 
of  Vo  has  a  regularly  varying  tail.  This  cone  is  smaller  than  the  cone  [0,  oo]m+1\{0}  traditionally 
employed  in  extreme  value  theory,  because  the  kernel  domain  of  attraction  condition  (2.5)  is  unin¬ 
formative  when  the  initial  state  is  not  extreme.  This  is  analogous  to  the  setting  of  the  Conditional 
Extreme  Value  Model  considered  in  mm- 

Proposition  5.1.  Assume  X  ~  K  with  K  e  D(G),  and  Vo  ~  H,  where  H  is  a  distribution  on 
[0,  oo)  with  a  regularly  varying  tail.  This  means  that  ast — »  oo,  for  some  scaling  function  b(t)  — >  oo, 

tH(b{t )■)  —>  t'a(-)  in  M_|_(0,oo], 

where  va(x,oc]  =  x~a  and  a  >  0.  Define  the  measure  v*  on  Em  =  (0,  oo]  X  [0,oo]m  by 
(5.1)  v*(dx0,dxm)  =  iya(dx0)  Pxo  [(T*,  . . . ,  T^)  €  dxm]. 

Then,  for  m  =  1,2,...,  the  following  convergences  take  place  as  t  -A  oo: 

(a)  In  M+((0,oo]m  X  [0,  oo]), 

tP  [bit)-1  (V0,  Vi,  ...  ,  Xm)  €  •  n(0,oo]m  x  [0,  oo]]  Au(-  n(0,oo]m  x  [0,  oo]) . 

(b)  In  M+(Em), 


tP[6(t)_1(vf(t)) 


V 


m) 


. 


(c)  If  either  G({0})  =  0,  y(t)  =  0  is  an  extremal  boundary,  or  K  satisfies  the  regularity 
condition  (4.3),  then  mM+(Em), 

t?[b{t)-\X0  ,  Vx ,  . . .  ,  Xm)  G  •  ]  -A  !/*(■). 

(d)  In  M+(0,  oo], 

tP[X0/b(t)  €  dx o  ,  r(b(t))  >  m\  -^4  (l  -  G({ 0}))'"  1  •  va(dxQ). 

Remark.  These  convergence  statements  may  be  reformulated  equivalently  as,  say, 


P[6(t)-1(V0,V1,...,Vm)E-  |  V0  >  &(*)]  =>  P[(Tq,  T\ 


1  5 


E  • 


where  Tg  ~  Pareto(a).  This  is  the  form  considered  by  Segers 
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Proof.  Apply  Lemma  8.4  (p.  22)  to  the  results  of  Theorems  3.1,  3.3  and  4.1  and  (3.15). 


□ 


In  the  case  m  =  1,  Ei  is  a  rotated  version  of  En  used  in  the  conditional  extreme  value  model  in 
mm  and  the  limit  can  be  expressed  as 

poo 

v*(0o,oo]  x  [o.xi])  =  /  ua{du)P[f  <  Xi/u]  =  Xq°  P[£  <  xi/x0\  ~  xfa  ^al{^Xl/Xo} 

j  x  0 

for  (xo,xi)  G  (0,  oo]  x  [0,oo],  where  £  ~  G  (with  E£“  <  oo).  Since 

v*  ((xo,  oo]  X  {0})  =  Xq01  P[£  =  0]  and  zx*  ((0,  oo]  x  (aq,  oo])  =  xfa  E£a, 

sets  on  the  xo-axis  incur  mass  proportional  to  G({0}),  and  sets  bounded  away  from  this  axis  are 
weighted  accordng  to  E£“.  A  consequence  of  the  second  observation  is  that 

liminf  tP\Xx/b(f)  >  x\  >  E£“  •  x~a. 

Thus,  knowledge  concerning  the  tail  behaviour  of  X\  imposes  a  restriction  on  the  distributions  G 
to  which  K  can  be  attracted  via  the  a-th  moment.  For  example,  if  tP[X\/b(t)  G  •]  —)■  ua,  then 
we  must  have  E£a  <  1;  this  property  will  be  examined  further  in  the  next  section  and  appears  in 
various  forms  in  Segers  [23j  and  Basrak  and  Segers  [1],  in  the  stationary  setting. 


5.2.  Joint  Tail  Convergence.  What  additional  assumptions  are  necessary  for  convergences  (b) 
and  (c)  of  the  previous  result  to  take  place  on  the  larger  cone  =  [0,  oo]m+1\{0}?  This  was 
considered  by  Segers  mm  for  stationary  Markov  chains.  In  (b),  the  dependence  on  the  extremal 
threshold  and  hence  on  t  means  we  are  in  the  context  of  a  triangular  array  and  not,  strictly  speaking, 
in  the  setting  of  joint  regular  variation.  However,  the  result  is  still  useful,  for  example,  to  derive  a 
point  process  convergence  via  the  Poisson  transform  [211  p.  183]. 

As  a  first  step,  we  characterize  convergence  on  the  larger  cone  by  decomposing  it  into  smaller, 
more  familiar  cones.  This  is  similar  to  Theorem  6.1  in  [22]  and  one  of  the  implications  of  Theorem 
2.1  in  pQ.  As  a  convention  in  what  follows,  set  [0,oo]°  x  A  =  A.  Also,  recall  the  notation  Em  = 
(0,  oo]  x  [0,  oo]m. 

Proposition  5.2.  Suppose  Yt  =  (It, o  ,  Yt, l ,  •  •  • ,  Lt,m)  is  a  random  vector  on  [0,oo]m+1  for  each 
t  >  0.  Then  there  exists  a  non-null  Radon  measure  fi*  on  E^  =  [0,  oo]m+1\{0}  such  that 

(5.2)  tP[(Tt,0,  Yt,!,  ...,  Yt,m)  G  •]  in  M+(E^)  (t  ->  oo) 

if  and  only  if  for  j  =  0, ...  ,m  there  exist  Radon  measures  pj  on  E  j  =  (0,  oo]  x  [0,  oo]J,  not  all  null, 
such  that 

(5.3)  t  P [(Yt:j  ,  . . . ,  Yttm)  G  ■]  Pm-j(-)  in  M+(Em_,-). 

The  relation  between  the  limit  measures  is  the  following: 

hm—j{')  —  p  ([0,  oo]-7  x  on  E m—j 

for  j  =  0, . . . ,  m,  and 

m 

p*([0,x}c)  =  Y2pm-j((xj,  oo]  x  [0,xJ+i]  x  •  •  ■  x  [0,xm])  for  x  G  E^. 

3=0 

Furthermore,  given  j  G  {0 1};  if  A  C  [0, oo]m_-7\{0}m_-7  is  relatively  compact,  then 
Pm-j(( 0,oo]  X  A)  <  oo. 
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Proof.  Assume  first  that  (|5.2[)  holds.  Fixing  j  G  {0, . . . ,  m},  define  /xm_j(-)  :=  /x*([0,  oo]-7  x  • )  (i.e. 
/jLm  =  fj,*).  Let  A  C  be  relatively  compact  with  fj. m_j(dA )  =  0.  Then  A*  =  [0,0c]-7  x  A 

is  relatively  compact  in  Wm,  and  &e *nA*  =  [0,  oo]-7  x  cfe m  -A,  so  M*(<9e^A*)  =  fim-j{dA )  =  0. 
Therefore, 

tP\(YtJ,  ...,  Yt,m)  G  A]  =tP[(Ytfi,...,Yt>m)eA*]  — *  n*(A*)  =  pm-j{A), 


establishing 

Conversely,  suppose  we  have  (|5.3[)  for  j  =  0, . . . ,  m.  For  x  G  (0,  oo]m+1,  dehne 


h(x)  =  ^2  fj-m-j  ((Zj,  oo]  X  [0,  Xj+ 1]  X  •  •  •  X  [0,  £m])  . 
j= 0 

Decompose  [0,ai]c  as  a  disjoint  union 

m 

(5.4)  [0,  x]c  =  [J  [0,  oo]-7  x  (xj,  oo]  x  [0,  Xj+ 1]  x  ■  ■  ■  x  [0,  xm] , 

3=0 

and  observe  that  at  points  of  continuity  of  the  limit, 

m 

(5.5)  tP[Yt  G  [0,  x]c  ]  =  ^  tP[(YtJ,  ...,  Yt,m)  G  {xj,oo\  x  [0,xi+i]  X  ■■■  X  [0,xm]]  — »  h{x). 

3=0 

Hence,  (5.2)  holds  with  the  limit  measure  p,*  defined  by  /x*([0,ai]c)  =  h{x).  Indeed,  given  /  G 
we  can  find  5  >  0  such  that  x$  =  (5, . . . ,  5)  is  a  continuity  point  of  h  and  /  is  supported 
on  [O,*^]0.  Therefore, 

t  E  f(Y t)  <  sup  f(x)  ■  sup  tP[Yt  G  [0,  <  oo, 


tceE; 


t>o 


implying  that  the  set  {fP[Tt  G  ■];  t  >  0}  is  relatively  compact  in  M+(E*7l).  Furthermore,  if 
tk  P [Y tk  G  ■  ]  — >  /x  and  Sk  P[TSfe  G  ■  ]  — >  p!  as  k  — >  oo,  then  p  =  p!  =  p*  on  sets  [0,  x\c  which  are 
continuity  sets  of  p*  by  (|5.5[).  This  extends  to  measurable  rectangles  in  E*,  bounded  away  from  0 
whose  vertices  are  continuity  points  of  h,  leading  us  to  the  conclusion  that  p  =  p'  =  p*  on  E^. 

Moreover,  since  we  can  decompose  [0,  x\c  for  any  x  G  E^  as  in  (5.4),  it  is  clear  that  p*  is  non-null 
iff  not  all  of  the  p.j  are  null. 

Finally,  for  1  <  j <  m  —  1,  if  A  C  [0,  oo]m_-7\{0}m_-7  is  relatively  compact,  then  it  is  contained 
in  [(0, . . . ,  0),  (xj- |_i, . . . ,  xm)]c  for  some  (xj+ 1, . . . ,  xm)  G  (0,  oo]”1--7.  Applying  (5.4)  once  again,  we 
find  that 

/xm_j(( 0,oo]  x  A)  =  /x*([0,  oo]-7  x  (0,  oo]  x  A) 

m 

<  22  //*([°>°°]'7+1  X  [0,Oo]fc-7-1  X  (Sfc.oo]  X  [0,  X/c+l]  X  •••  X  [0,.Tm]) 


k=j+l 

m 


22  Pm-k({xk,  oo]  X  [0,®fe+i]  X  •  •  •  X  [0,  Xm])  <  OO. 
k=j+l 


□ 


Consequently,  the  extension  of  the  convergences  in  Proposition  5.1  to  the  larger  cone  Ej^  follows 
from  regular  variation  of  the  marginal  tails. 

Theorem  5.1.  Suppose  X  ~  K  G  D{G),  and  let  b(t )  — >•  oo  be  a  scaling  function  and  a  >  0.  Then 
(5.6)  tP[6(t)-1(4b(i)),  X[b{t)\  ...,X<W»)  G  •]  -4 //*(■)  in  M+(E^)  (t -►  oo), 
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where 

p*\Em(dx0,dx)  =  va(dx0)PXo[(Tf,  . . . ,  T*  )  G  dam]  =  v*(dx0,  dx), 

if  and  only  if 

(5.7)  t  P  [xf^ /b(t)  £  •  ]  -4  cj  va(-) 

in  M+(0,  oo],  with  co  =  1  and  (E£")J  <  Cj  <  oo  /or  /  =  1, . . . ,  m. 

Proof.  Assume  first  that  (|5.6|)  holds.  It  follows  that 

tP[Xo  >  b(t)x\  — >  v*((x,oo\  x  [0,oo]m)  =  x~a 


for  x  >  0.  Hence,  b(t)  £  RVi/q,  so  by  (5.6)  again,  we  have  for  j  >  1 

tP[Xjb^  >  b(t)x]  — y  //([(boo]-7  x  (x,  oo]  X  [0,  oo]™'--7)  =  Cj  x~a , 

and 

Cj  >  //((0,  oo]  x  [0,0c]-7"-1  x  (l,oo]  x  [0,0c]”1--7)  =  f  va(du)  P[/i---/j  >  ti_1] 

J (0,oo] 

=  e(^-..^)q  =  (e  nj- 

Conversely,  suppose  that  (|5.7[)  holds  for  j  =  0, . . .  ,m.  Lemma [8~4] implies  that  in  M+(Em_j), 
t  P  [b(ty\X^bit)\  . . . ,  xjWV)  £  (dx  o,  dx) }  -4  Cj  ua(dx  o)  PX0  [(Tf , . . . ,  T*m_j)  £  dx] 

—  •  Pjm—j  ((dxQ,  dx)) 

by  the  Markov  property,  and  Proposition  5.2  yields  (5.6),  with  ^*|Em(-)  =  IMn(')  = 


□ 


At  the  end  of  Section 


cases  were  outlined  in  which  we  could  replace  Xy- 


m) 


by  Xj.  Theorem 


5.1  is  most  striking  for  tnese  since  it  shows  that  for  a  Markov  chain  whose  kernel  is  in  a  domain 
of  attraction,  to  obtain  joint  regular  variation  of  the  fdds  it  is  enough  to  know  that  the  marginal 
tails  are  regularly  varying.  In  particular,  if  X  has  a  regularly  varying  stationary  distribution  then 
the  fdds  are  jointly  regularly  varying.  This  result  was  presented  by  Segers  |23|.  and  Basrak  and 
Segers  [Ij  showed  that  for  a  general  stationary  process,  joint  regular  variation  of  fdds  is  equivalent 
to  the  existence  of  a  “tail  process”  which  reduces  to  the  tail  chain  in  the  case  of  Markov  chains. 


However,  what  Proposition  5.1  emphasizes  is  that  it  is  the  marginal  tail  behaviour  alone,  rather 
than  stationarity,  which  provides  the  link  with  joint  regular  variation. 


Theorem  5.1  also  extends  the  observation  made  in  Section  5.1  that  knowledge  of  the  marginal 
tail  behaviour  for  a  Markov  chain  whose  kernel  is  in  a  domain  of  attraction  constrains  the  class 
of  possible  limit  distributions  G  via  its  moments.  If  a  particular  choice  of  regularly  varying  initial 


i/i 


In  particular,  if  X  admits 


distribution  leads  to  t  P [Xj  >  b(t)  ■  ]  —y  ajVa( •),  then  we  have  E£“  <  a- 
a  stationary  distribution  whose  tail  is  RV_Q,  then  E£Q  <  1. 

6.  Examples 

Our  first  example  illustrates  the  main  results. 

Example  6.1.  Let  V  =  ( Z ,  W)  be  any  random  vector  on  [0,  oo)  x  R.  Consider  the  update  function 
ip(y,  V)  =  ( Zy  +  W)+  and  its  canonical  form 

fp(y,  V)  =  Zy  +  (j)(y,  W)  =  Zy  +  (W  l{w>_Zy}  -  Zy  1  {W<-Zy})  ■ 

For  y  >  0,  the  transition  kernel  has  the  form  K(y ,  (x,  oo))  =  P  [Zy  +  W  >  x\.  Since  t~lif(t,  V)  = 
(Z  +  t~1W)+  —y  Z  a.s.,  we  have  K  £  D(G)  with  G  =  P [Z  £  •  ].  Furthermore,  using  Proposition |3.l[ 
the  function  7 (t)  =  y/t  is  of  larger  order  than  <p(t,  w),  so  y(t)  =  1/y/t  is  an  extremal  boundary.  Since 
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condition  (4.3) 

dt6 


Consequently,  from  Theorem 


is  bounded  on  neighbourhoods  of  0,  Proposition  4.1  (c)  implies  K  satisfies  the  regularity 

we  obtain  fdd  convergence  of  t~l X  to  T*  as  in 

□ 


4.1 


If  K  does  not  satisfy  the  regularity  condition  (4.3),  Theorem  4.1  may  fail  to  hold  and  starting 
from  tu,  t~1X  may  fail  to  converge  to  T*  started  from  u. 


Example  6.2.  Let  V  =  (Z,  IF,  IF')  be  any  non-degenerate  random  vector  on  [0,  oo)3,  and  consider 
the  Markov  chain  determined  by  the  update  function 

i>(y,  V)  =  Zy  +  W  y~l  l{y>o}  +  W'  l{y=0}- 

For  y  >  0,  the  transition  kernel  is  K(y,  (x,oo))  =  P [Zy  +  Wy~l  >  x]  and  since  V)  = 

Z  +  Wt~ 2  Z  a.s.,  we  have  K  6  D(G)  with  G  =  P [Z  G  •].  Furthermore,  using  Proposition 1 3 . 1  [ 
the  function  7 (t)  =  1  is  of  larger  order  than  <f>(t,w ),  so  y{t)  =  1/t  is  an  extremal  boundary. 

However,  note  that  (j)(y,  (VP,  IF'))  =  IFy-1l/y>o}  +  W'\<y=Q\  is  unbounded  near  0,  implying  that 
Segers’  boundedness  condition  (4.2)  does  not  hold.  In  fact,  our  form  of  the  regularity  condition 
(4.3)  fails  for  K.  Indeed, 

K (tut ,  t(x,  00))  =  P [Ztut  +  W/(tut )  >  tx]  =  P [Zut  +  W/(t2ut )  >  x\. 


Choosing  ut  =  t  2  yields  K(tut ,  t(x,  00))  — >  P[W  >  x\.  For  appropriate  x.  this  shows  (4.3)  fails. 


Not  only  does  (4.3)  fail  but  so  does  Theorem|4.1[  since  the  asymptotic  behaviour  of  X  is  not  the 
same  as  that  of  X^T.  We  show  directly  that  the  conditional  fdds  of  t~lX  fail  to  converge  to  those  of 
T* .  The  idea  is  that  if  Xk  <  y(t)  =  i  ,  there  is  a  positive  probability  that  X).+\  >  t.  We  illustrate 
this  for  m  =  2.  Take  /  E  C[0,oo]2  and  u  >  0.  Observe  if  Xq  =  tu  >  0,  from  the  definition  of 
X\  =  Z\tu  +  W\/{tu)  and  X-2  =  Z2X1  +  {W2/ X\)l{Xl>o}  +  Furthermore,  on  {Z\  >  0}, 

we  have  X\  >  0  and  X2  =  Z2X1  +  W2I X\.  On  {Z\  =  0,  W\  >  0},  X\  >0  and  X2  =  Z2X1  +  W2/ X\. 
On  {Z\  =  0,  W\  =  0},  we  have  X\  =  0  and  X2  =  IF'.  Therefore 


Etuf(Xi/t,  X2/t)  =  Etuf(Xi/t,  X2/t)  1  {z1>0}  +  Etuf{X\/t,  X2/t)  l{Zi=o,Wi>o} 

+  E-tuf(Xi/t,  X2/t )  l{Zl=o,Wi=o}  =  A  +  B  +  C. 

For  A,  as  t  — >  00,  we  have 

A  =  E f(ZlU  +  Wi/(t2u),  Z2[ZlU  +  W±/(t2u)\  +  W2/[Z1t2u  +  IF 1F1])  l{Zl>0} 

— >  Ef(Ziu,  ZiZ2u)  l{z1>0}, 
while  for  B  we  obtain  for  t  — >  00, 

B  =  Ef(Wi/t2u,  Z2W1/(t2u )  +  W2u/W1)  l{Zl=o,Wi>o}  ~ >  E/(0,  uW2/W1)  l{z1=o,Wl>o}- 
Finally  for  C, 

C  =  E/(0,  W'/t)  l{Zl=0,Wl=0}  =  P[Zi  =  0,  Wi  =  0]  E/(0,  W2/t)  — >  P [Zx  =  0,  IFr  =  0]  /( 0,  0). 

Observe  that  lim^oofH  +  B  +  C\  /  E uf(T*,T£)  =  Ef(uZi,  UZ1Z2).  □ 

In  the  final  example,  the  conditional  distributions  of  t~1X  converge  to  those  of  the  tail  chain 
T* ,  even  though  the  regularity  condition  does  not  hold.  This  includes  cases  for  which  G({0})  =  0 
and  G({0})  >  0  with  extremal  boundary  y(t)  =  0. 


Example  6.3.  Let  { ,  Vj ) ,  j  >  1}  be  iid  copies  of  the  non-degenerate  random  vector  (£,77)  on 
[0,oo)2.  Taking  V  =  (£,  77),  consider  a  Markov  chain  which  transitions  according  to  the  update 
function 


V)  =  £(y  +  y  x)  l{y>0}  +  V  l{y=o}  =£y+{£y  1  1{y>0}  +  V  , 


ASYMPTOTICS  OF  MARKOV  KERNELS  AND  THE  TAIL  CHAIN 


21 


where  the  last  expression  is  the  canonical  form.  For  y  >  0,  the  transition  kernel  is 

K(y,  [0,z])  =  P  [£( y  +  y <  x]  =  P  [f  <  x/{y  +  y-1)]  . 

For  t  >  0,  trlip{t,V)  =  £(1  +  t~2)  -A  £  a.s.,  so  K  G  D(G)  with  G  =  P[£  G  •].  Note  that 
4>(y,  V)  =  £y_1  1{j/>o}  +7/  l{;t/=o}  is  unbounded  near  0,  implying  that  Segers’  boundedness  condition 
(|4.2[)  does  not  hold.  Also,  our  regularity  condition  (|4.3[)  fails  for  K .  To  see  this,  write 

K (tut ,  t(x,  oo))  =  P  [£  >  x/{ut  +  (t2^)^1)]  . 

Fix  x  so  that  P[£  >  x]  >  0  and  choose  ut  =  t~2 .  This  yields  ut  +  ( t2ut)~l  =  1  + 1~2,  implying  that 

K(tut ,  t(x ,  oo))  =  P  [£  >  x/(l  +  f-2)]  >  P[£  >  x]  >  0, 

so  (4.3)  fails  for  K.  However,  since  K(t,  {0})  =  P[£  =  0]  =  G({0}),  the  choice  y(t)  =  0  satisfies 

ergence  of 
may  hold 


the  definition  of  an  extremal  boundary  (3.6),  even  if  G({0|)  >  0.  This  leads  to  fdd  convergence  of 
fr-'X  G 


tu  [ 


to  PU[T*  G  •  ],  and  thus  we  learn  that  the  conclusion  (4.6)  of  Theorem 


4.1 


without  (4.3)  being  true. 

We  prove  the  fdd  convergence  for  m  =  2.  For  u  >  0,  and  Xq  =  tu,  we  have  X\  =  £i (tu+  ( tu )_1) 
and  X2  =  &{Xi+X^1)  l{x,>o}+h2  l{Xi=o}-  0n  ixi  >  0}  =  {6  >  0}  we  have  X2  =  ^(AA+A'f1). 
On  {Xi  =  0}  =  {^i  =  0},  we  have  X2  =  r/2 .  Thus,  as  t  — >  oo, 

E tuf(Xi/t,  X2/t)l{Xl>o}  =  Etu/(^i[«  +  (t2n)-1],  &£i[u  +  {t2u)~2}  +  &/(£i[t2u  +  1/u]))  1{X!>0} 

— ^  E/(6«,  66“)  >0}  j 

while 

E tuf{Xi/t,  X2/t)  l{Xl=o}  =  E/(0,  rti/t)  l{5l=o}  — >  P[6  =  0]  /( 0,0). 

We  conclude  that 

E tufiXi/t,  X2/t)  — >•  E/Ki,  <  16)  =  Eu/(r1*,  t2*).  □ 


7.  Concluding  Remarks 

We  have  thus  placed  the  traditional  tail  chain  model  for  the  extremes  of  a  Markov  chain  in 
a  more  general  context  through  the  introduction  of  the  boundary  distribution  H  as  well  as  the 
extremal  boundary.  A  common  application  of  the  tail  chain  model  is  in  deriving  the  weak  limits 
of  exceedance  point  processes  for  X  E  ng  I22|.  We  will  shortly  use  our  results  to  develop  a 
detailed  description  of  the  clustering  properties  of  extremes  of  Markov  chains  by  means  of  such  point 
processes.  Furthermore,  as  we  have  not  employed  stationarity  in  our  finite-dimensional  results,  we 
propose  to  substitute  the  inherent  regenerative  structure  of  a  Harris  recurrent  Markov  chain  for 
the  traditional  assumption  of  stationarity.  Also,  it  would  be  interesting  to  explore  the  implications 
of  choices  of  H  other  than  eo- 


8.  Appendix:  Technical  Lemmas 


This  section  collects  lemmas  needed  to  prove  convergence  of  integrals  of  the  form  f  fn  dyn, 
assuming  that  fn  — >  /  and  y,n  — >  y  in  their  respective  spaces.  An  example  is  the  second  continuous 
mapping  theorem  E  Theorem  5.5,  p.  34], 

Lemma  8.1.  Assume  IE  and  E'  are  complete  separable  (cs)  metric  spaces,  and  for  n  >  0,  hn  : 
E  — >  E'  are  measurable.  Put  A  =  {x  £  E  :  hn(xn)  -A  ho(x)  whenever  xn  -A  x}.  If  Pn,  n  >  0 
are  probability  measures  on  E  with  Pn  =>-  Pq,  and  hn  -A  almost  uniformly  in  the  sense  that 
P(A)  =  1,  then  Pn  o  h~l  =4>  Pq  o  h^  1  in  E'. 


The  result  provides  a  way  to  handle  the  convergence  of  a  family  of  integrals. 

Lemma  8.2.  In  addition  to  the  assumptions  of  Lemma  8.1,  require  E'  =  M  and  {hn,  n  >  0}  is 
uniformly  bounded,  so  that  supn>0  supxeE  \hn(x)\  <  oo. 
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(a)  We  have 


ho  dPo- 


(b)  Suppose  additionally  that  E  is  locally  compact  with  a  countable  base  (Iccb),  and  pn  —>  po  in 
M_|_(E)  with  po(Ac)  =  0.  If  there  exists  a  compact  set  B  £  /C(E)  with  po(dB)  =  0  such  that 
hn{x)  =  0,  n  >  0  whenever  x  fL  B  (i.e.  B  is  a  common  compact  support  of  each  hn),  then 


/  h  n  dpn  }■  /  ho  dfj-o- 
J  E  J  E 

Proof,  (a)  If  Xn  ~  Pn  for  n  >  0,  then  hn( Xn)  =>■  ho(Xo).  The  uniform  boundedness  of  the  hn 
guarantees  that  E hn(Xn)  Eho(Xo). 

(b)  View  B  as  a  compact  subspace  of  E  inheriting  the  relative  topology.  Then,  assuming 
p(B)  >  0  to  rule  out  a  trivial  case,  define  probabilities  on  B  by  Pn{  )  =  pn(  ■  n  B)/pn(B),  n  >  0. 
Since  pn(  -HB)  — ^4-  p0(  -n B)  by  Proposition  3.3  in  [T5J,  and  B  is  compact,  we  get  Pn  =>■  Pq.  Denote 
by  h'n,  n  >  0,  the  restriction  of  hn  to  B.  Observe  that  for  any  x  £  A  n  B,  we  have  h'n(xn)  — >  h'(x) 
whenever  xn  — >  x  in  B,  and  P(AC  n  B)  <  p(Ac)/p(B)  =  0.  Therefore,  apply  part  (a)  to  obtain 


hn  df^n 


/  hn  1B  dpn 
J  E 


Hn{B)  f  tin  dPn 
Jb 


ho(B) 


/  hQ  duo. 
J  E 


□ 


A  convenient  specialization  of  Lemma  8.2  (b)  is  the  following. 


Lemma  8.3.  Suppose  E  is  Iccb  and  pn  -4  p  in  M+(E).  If  f  :  E  — >  M  is  continuous  and  bounded, 
and  B  G  E  is  relatively  compact  with  p(dB)  =  0,  then 


Take  hn  =  flB  for  n  >  0.  Since  flB  is  continuous  except  possibly  on  dB,  we  have  p(Ac)  < 
p{dB)  =  0. 

The  next  result  is  used  to  extend  convergence  of  substochastic  transition  functions  to  multivariate 
regular  variation  on  a  larger  space. 

Lemma  8.4.  Let  E  C  [0,oo]m  and  E'  C  [0,oo]m/  be  two  nice  (Iccb)  spaces.  Suppose  for  t  >  0 
that  {pd\-,  ■  )}t>o?  are  substochastic  transition  functions  on  Ex  £>(E').  This  means  p^\- ,  B) 
is  a  measurable  function  for  any  fixed  B  £  £>( E'),  p^\x ,  •)  is  a  measure  for  any  x  £  E,  and 
supt>o  supueE  pdi  (u ,  E')  <  1.  Assume  there  is  a  set  A  C  E  such  that 

p^  (ut ,  • )  —>  p^  {u  ,  • )  in  M+(E/)  (t  -»  oo) 

whenever  ut  ^  u  in  E  and  u  £  A.  Suppose  also  that  {v^}t> o  are  measures  on  E  such  that 
z/0)(Ac)  =  o,  and  fd)  If  j/(°)  in  M+(E).  Then,  defining  measures  gft)  for  t  >  0  on  E  x  E'  as 

p^(du,dx)  =  v^\du)p^{u,  dx)  , 

we  have 

p®  — 4//0)  in  M+(E  x  E7)  (t ->  oo). 

Proof.  Let  f  £  C~t( E  x  E');  without  loss  of  generality  assume  f  is  supported  on  K  x  K’ ,  where 
K  £  /C(E)  and  K’  £  /C(E').  We  have 

f  pA\du,  dx)  f(u,  x)  =  f  v^\du)  f  p®(u,  dx)  f(u,x). 

J  ExE'  J  E  J  E' 
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For  t  >  0,  write 

<Pt{u)  =  /  P^{u,  dx)  f(u,x ) 

Jw 

and  suppose  lit  — >  uq  with  uq  £  A;  we  verify  that  ipt(ut)  — >  <po{uo).  Writing  gt(x )  =  f(ut,x),  t  >  0, 
we  have  gt(xt)  -A  go(xo)  whenever  Xi  — >■  x 0  £  E'  by  the  continuity  of  /.  Also,  the  gt  are  uniformly 
bounded  by  the  bound  on  /  and  fp(x)  =  0  for  all  t  whenever  x  ^  K' .  Furthermore,  without  loss  of 
generality  we  can  assume  that  p^°\u ,  diC)  =  0.  Now  apply  Lemma 


1.2 


(b)  to  obtain 


=  /  p^(ut,  dx)  gtfx ) 
JE' 


p(0)(u,  dx)  fifo(^)  =  <Po(* 


/E' 


Since  the  pP  are  substochastic,  and  pt{u)  =  0  for  all  t  whenever  u  cf  K,  the  ipt  are  uniformly 
bounded  by  the  bound  on  /  .  Assume  similarly  that  v(dK)  =  0,  and  recall  that  is(Ac)  =  0.  Apply 
Lemma  |8.2|  (b)  once  more  to  conclude  as  t  -A  00  that 


/  ExE' 


p^\du,  dx)  f(u,  x)  =  f  v^^du)  (pt(u) 
J  E 


I  v^\du)  (fo(u)  =  I  //(-U')  (du,  x)  f(u,x).  □ 
/  E  xExE' 


,(«)/ 


We  conclude  this  section  with  a  result  used  to  verify  the  existence  of  the  extremal  boundary. 


Lemma  8.5.  Suppose  Pt,  t  >  0  are  probability  measures  on  a  cs  metric  space  E  such  that  Pt  =>■  Pq, 
and  let  A  C  E  be  measurable.  Then  there  exists  a  sequence  of  sets  At  f  A  such  that  Pt(At)  — »  Pq(A). 

Remark.  Note  that  if  P{dA)  =  0  then  we  can  take  At  =  A.  In  the  case  of  distribution  functions 
Ft  =>  F  on  Mm,  taking  A  =  (— 00,  £c]  and  metric  p  =  poo  shows  that  for  any  x  £  Mm  there  exists 
xt  |  x  such  that  Ftfxt)  — »  F(x). 

Proof.  Let  p  be  a  metric  on  E,  and  consider  sets  Ag  =  {x  :  p(x,  A)  <  d}.  Recall  that  Po(dAg)  =  0 
for  all  but  a  countable  number  of  choices  of  5,  since  F(5)  =  Po(Ag)  —  Pq(A)  is  a  distribution 
function.  First  choose  {<5^  :  k  =  1,2,...}  such  that  0  <  8k+\  <  4Al/(fc  +  l)  and  Po(dAgk)  =  0 
for  all  k.  Next,  let  so  =  0  and  take  >  Sk-\  +  1,  k  =  1, 2, . . .  such  that  Pt(A$k)  >  Pq(A)  —  1  /k 
whenever  t  >  this  is  possible  since  Pt(Agk )  — »•  Po(Agk)  >  Po(A)  for  all  k.  Finally,  for  t  >  0  set 

OO 

A(t)  —  Agx  l(oiSl)(i)  +  E  -A-Sk  l[sfe,sfc+i  )(t). 

k=  1 

We  claim  that  Aft)  f  A  and  that  Pt(A(t))  — >  Pq(A)  as  t  — >  00.  It  is  clear  that  A(t)  D  A(t')  for 
t  <  t ',  and  nt  Aft)  =  Agk  =  A.  On  the  one  hand,  for  large  t  we  have  Aft)  C  Agk  for  any  k,  so 

limsup  Pt(Aft))  <  limsup  Pt(ASk)  <  P0(Agk). 

t—foo  t—foo 

Letting  k  -A  00  shows  that  limsupt  Pt(A(t))  <  Pq(A).  On  the  other  hand,  if  kft)  denotes  the  value 
of  k  for  which  Sk  <  t  <  Sk+ 1,  then 

Pt(A(t))  =  Pt{Agm)  >  P0(A)  -  1  /kft), 

so  liminft  Pt(A(t))  >  Pq(A).  Combining  these  two  inequalities  shows  that  Pt(Aft))  -A  Pq{A).  □ 
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